Topic: Web search and online communities
content qualitynow we are getting better contents such as flickr
content growthprofessional content
Personal content
total web growth: 1-3 M/day
published: 3-4GB/day
professional web: 2 GB
social web ~5-10 GB
private text: 2TB
upper bound: 140TB
fragmentationcontent ownership is fragmenting
yahoo share of web content >10% of web
metcalfe lawThe community value of a network grows as the square of the number of its users increase.
social media challengesfind it: find the right data which is original and worth-indexing
combine it: combining various type of data and media.
coping with scaleupper bound: 140 tb/day
me: its a simplistic way of calculating the data. data increase is not a linear function.
storage: 52pb/yr
cost: 25 million dollor/yr
key things in generating data and owning a part of internetgathering contents
making deals
working with users
understanding the contentcrawler: lucene zettair lemur
new in web searchclass specific QP
vertical alignment
inlined content from 3rd parties
new interfaces as in msn web search
user contributed structures
integrated UGC- user generated content
simple structured queries
query correction
local search
structured content
where research has mostly done:core search
local search
where not doneeffectively using desktop real estate
providing integrated content to users
integrating social networking to search.
Questions:how to monetize ajax based websites
how the search engines crawl graphic and ajax sites
ooooooooooooo End of the post oooooooooooooooo