IMYours
Members: 12,909
Threads: 13,207
Posts: 46,369

Newest Member: greenberettt
  #1  
Old 12-15-2004, 01:07 AM
onauc onauc is offline
Junior Member
 
Join Date: Dec 2004
Posts: 10
iTrader: (0)
Thanks: 0
Thanked 0 Times in 0 Posts
onauc is on a distinguished road
Default google searchengine programming

Howdy,

I want to know how the google, webcrawler etc. searchengines really work as I am learning php programming and want to write a searchengine.
I have read around 10 websites, found on google, about “how searchengines work” and not a single one of them make it clear if it is the spider or the index or the search software does the ranking according to it’s ranking algorithm.
All they ever say is that, a searchengine has 3 softwares :
a) the spider
b) the index
c) the search system (search-box, template, etc.)
The spiders crawl the web collecting webpages and then forward them to the index and then the search software searches the index for the sought keywords/phrases.
Also, some say that the spiders copy the whole website into it’s index. So, in other words, there is 2 copies of a website. One residing in the website owner’s webserver and the other residing on the index of the searchengine.
So now, I can only assume 3 possibilities how a searchengine works from all this:

1.
The spider does not do the ranking according to any algorithm.
All it does is visit a website, grab all it’s html codes (copy a website) and then dump the html codes to it’s index.
The Index is nothing but a big txt file (.txt, .html) on the searchengine’s webserver that keeps full copy (html codes) of each website.
The search-system, when searching and finding links (in the index) gives the ranking according to the searchengine’s ranking algorithm.
This means, the spider nor the index is responsible for the ranking because these 2 parts of the searchengine are not taught the ranking algorithm.

OR

2.
The spider does the ranking according to the searchengine’s ranking algorithm.
It visits a website and grabs all it’s html codes (copy a website) and then finally dump the html codes to it’s index. When it dumps the copies of websites it ranks them according to the searchengine’s algorithm.
The Index is nothing but a big txt file (.txt, .html) on the searchengine’s webserver that keeps full copy (html codes) of each website.
The search-system, when searching and finding links (in the index) does not give the ranking according to the searchengine’s ranking algorithm because that has been already done by the spider when dumping the data onto the index.
This means, the spider is responsible for giving the ranking and not the index nor the search-system responsible for the ranking because these 2 parts of the searchengine are not taught the ranking algorithm.

OR

3.
The spider does not do the ranking according to any algorithm.
All it does is visit a website, grab all it’s html codes (copy a website) and then dump the html codes to it’s index.
The Index is not only a big txt file (.txt, .html) on the searchengine’s webserver that keeps full copy (html codes) of each website but also the system that does the ranking.
When it receives data from the spider, it ranks the links in it’s database according to the searchengine’s ranking algorithm.
The search-system, when searching and finding links (in the index) does not give the ranking according to the searchengine’s ranking algorithm.
Frankly, all it does is output a copy of certain parts of the index onto a searcher’s screen.
This means, neither the spider or the search-system is responsible for the ranking because these 2 parts of the searchengine are not taught the ranking algorithm.


So, which assumption is correct according to the 3 above ?


Ok, I am not thinking of competing with google but you should understand that I want to run a searchengine and it should have a spider, an index and a search facility and I should be able to teach it ranking algorithms.
The web-scripts out there do not offer the admin to teach his searchengine (that runs with these ready-made web-scripts) their own ranking algorithm.
The web-script developing company built the ranking algorithms and we admins cannot change them.
The major searchengines can change their ranking algorithms from time to time when they find-out that webmasters have guessed their ranking algorithms and are abusing them to get their non-relevant websites ranked high under every keyword under the sky.
eg.
I run a search-engine. I use a ready-made web-script. My search-engine one day gets popular. Now, you decide to get traffic to your website from it.
You check what ready-made web-script I am using and you buy that script and experiment on it and find-out the ranking algorithm.
Now, you falsely optimise your website so it ranks high under every keyword on my searchengine, even those keywords that are not really related to your website. Sooner or later, people dump my searchengine. My venture comes to a dead-end.
Now, to avoid all this, I must be able to change my ranking algorithm when I fiond-out that webmasters have found-out my ranking algorithm and are abusing it.
Typical that these ready-made searchengine web-scripts do not offer the admin to change the ranking algorithm and create their own algorithms too.

Also, what is peer-to-peer searchengine ?
Reply With Quote
Sponsored Links
  #2  
Old 12-16-2004, 12:09 AM
temi's Avatar
temi temi is offline
Senior Member
 
Join Date: Aug 2004
Location: London
Posts: 3,197
Recent Blog:
iTrader: (15)
Thanks: 4
Thanked 48 Times in 46 Posts
temi has a brilliant futuretemi has a brilliant futuretemi has a brilliant futuretemi has a brilliant futuretemi has a brilliant futuretemi has a brilliant futuretemi has a brilliant futuretemi has a brilliant futuretemi has a brilliant futuretemi has a brilliant futuretemi has a brilliant future
Default

You have an excellent point about the limitations of off-the-shelf script but writing a search engine is no task for an individual, I think you will need quite a few associates to get the project off the ground
__________________
Low cost web hosting at
To view links or images in signatures your post count must be 1 or greater. You currently have 0 posts.

| Information about setting up your online store? join the discussions at
To view links or images in signatures your post count must be 1 or greater. You currently have 0 posts.


To view links or images in signatures your post count must be 1 or greater. You currently have 0 posts.

====
To view links or images in signatures your post count must be 1 or greater. You currently have 0 posts.
====
Reply With Quote
  #3  
Old 12-16-2004, 04:56 PM
Darksat Darksat is offline
Member
 
Join Date: Dec 2004
Posts: 34
iTrader: (0)
Thanks: 0
Thanked 0 Times in 0 Posts
Darksat is on a distinguished road
Default

Most new search engines such as killerinfo.com just pull results from other engines.

as far as I am aware though the spider grabs the info puts it in the index and then the data in the index is tabulated into a databse which is used to return results.
Reply With Quote
  #4  
Old 12-16-2004, 05:41 PM
temi's Avatar
temi temi is offline
Senior Member
 
Join Date: Aug 2004
Location: London
Posts: 3,197
Recent Blog:
iTrader: (15)
Thanks: 4
Thanked 48 Times in 46 Posts
temi has a brilliant futuretemi has a brilliant futuretemi has a brilliant futuretemi has a brilliant futuretemi has a brilliant futuretemi has a brilliant futuretemi has a brilliant futuretemi has a brilliant futuretemi has a brilliant futuretemi has a brilliant futuretemi has a brilliant future
Default

I don't think the spider at killer.info add the data to its database, I think it just grabs the data from the 3rd party search engine and serve it to the searcher, that way the proper engines do all the hard work of
spedering etc, its a bit like the meta search engine I will be integrating into www.haabaa.com when I have some time.
__________________
Low cost web hosting at
To view links or images in signatures your post count must be 1 or greater. You currently have 0 posts.

| Information about setting up your online store? join the discussions at
To view links or images in signatures your post count must be 1 or greater. You currently have 0 posts.


To view links or images in signatures your post count must be 1 or greater. You currently have 0 posts.

====
To view links or images in signatures your post count must be 1 or greater. You currently have 0 posts.
====
Reply With Quote
  #5  
Old 01-18-2005, 08:26 AM
alexandru alexandru is offline
Senior Member
 
Join Date: Aug 2004
Location: Oradea, Romania
Posts: 135
iTrader: (0)
Thanks: 0
Thanked 0 Times in 0 Posts
alexandru is on a distinguished road
Default

Not to mention the hardware needed for a search engine.... Google uses a farm of 4000 clustered PCs... runing Debian Linux :)
So if you want to have only 1/100 of their computational power, you still need 40 PCs....
__________________

To view links or images in signatures your post count must be 1 or greater. You currently have 0 posts.
|
To view links or images in signatures your post count must be 1 or greater. You currently have 0 posts.
Reply With Quote
Reply

Bookmarks

Thread Tools Search this Thread
Search this Thread:

Advanced Search
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are On

Forum Jump

Similar Threads
Thread Thread Starter Forum Replies Last Post
Get more from Google search with Google Advanced Operators. kollam003 General Webmaster Discussions 1 04-24-2009 02:51 PM
GWord - Google Ads, Google AdWords, Advertising your business on Google CSPOST Post Exchange 1 03-28-2009 10:26 AM
***** controled internet sling Forum Lobby 1 12-16-2008 09:40 PM
Google Earth Offers New Local Search Experience ovi General Webmaster Discussions 3 07-04-2005 10:44 AM
Google is Growing – An Update on Google ovi General Webmaster Discussions 0 10-04-2004 10:14 PM


All times are GMT +1. The time now is 04:40 AM.

Powered by vBulletin® Version 3.8.1
Copyright ©2000 - 2010, Jelsoft Enterprises Ltd.
SEO by vBSEO 3.2.0

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148