You think the Web is big? In truth, it’s far bigger than it appears. The Web is made up of hundreds of billions of Web documents — far more than the 8 billion to 20 billion claimed by Google or Yahoo.

But most of these Web pages are largely unreachable by most search engines because they are stored in databases that cannot be accessed by Web crawlers. Glenbrook Networks has been working on accessing these documents, using technology that crawls databases, and which automatically completes online forms and extracts data. Here’s our Merc story today (free registration). The company can do some interesting things, like this map of jobs in Silicon Valley.

Trackback URL

4 Trackbacks

  1. Software Only said:

    Glenbrook Networks in the San Jose Mercury News - and Search Engine Watch

    SiliconBeats Michael Bazeley featured Glenbrook Networks co-founders Julia and Edward Komissarchik, and the Glendor showcase, in a great piece about Deep Web search and information extraction. Michael summarized it quite well: Komissarchik and her fath…

  2. Glendor.com Blog said:

    Glenbrook Networks in the San Jose Mercury News

    SiliconBeat’s Michael Bazeley featured Glenbrook Networks co-founders Julia and Edward Komissarchik, and the Glendor showcase, in a great piece about “Deep Web” search and information extraction. Michael summarized it quite well:

  3. AI3 - Adaptive Information::: said:

    Intellectual Honesty, Attribution, Historical Revisionism, and Truth: The ‘Deep Web’ Example

    Last week I came across a reference from Search  Engine Watch – for which I have been a subscriber for many years and have been a speaker at their conferences — that TOTALLY FRIED me.  It’s related to a topic near…

  4. June 24th, 2006
    7:02 pm

    health savings account said:

    health savings account

    I don’t really exist therefore I sing.

4 Comments

  1. Dave McClure said:

    Glenbrook does have some interesting technology. and there’s definitely more stuff out there than even the very broad (but thin) crawls that Yahoo and Google do. most folks aren’t aware of what tools like these or Kapow & Transparensee can offer, in addition to other in-house proprietary technology. still, having a particular vertical focus makes it easier to compete with the big guys.

    fyi, Glenbrook et al aren’t the only folks doing a job map mashup:
    http://tinyurl.com/de8kj

    look for more data visualization coming soon from several corners…

  2. Peter Rip said:

    This is interesting but a bit vexing. The essence of a search engine is its generality. By limiting their ’showcase’ to the Bay Area, it undermines the notion of generality. A more compelling showcase would have been to actually show how the engine works with an arbitrary location. It is not that difficult to hand-code the solution for one limited market. What this showcase shows is just a concept.

  3. Jeff Clavier said:

    Peter> Glenbrook can turn its web trawlers to any location, or set of companies, at the end of the day it depends on the amount of gear thrown at the problem.
    The reason for focusing on Bay Area companies was to get a meaningful set of results for the sort of tech jobs users of the showcase might look for, and deliver some value to them whilst putting our technology stack to use in a real context.

    For me a concept prototype (like a concept car) is a one time realization from the labs that will be used one day to build real products. This showcase is leveraging 4 years of R&D in the field of information extraction, and could easily grow to millions of jobs by scaling the back end infrastructure. And since the system is generic, ie it does not use a templating system to extract information from web sites, scalalibility is not an issue.

    Feel free to get in touch to discuss further.

  4. Yuri Ammosov said:

    Matt: if you try to search with this engine for “CTO”, it will return as part of results all “DireCTOr” entries. This hardly counts as high level of search sophistication - in fact, lack of string/word distinction is an entry-level error, especially such common job-related words. It is good to have a powerful crawler, but this alone does not deliver user value. Previous venture of this team failed in part due to same lack of attention to detail; this time, they got to make sure they have no blunders like this.

Add a Comment