000 07188cam a2200409 a 4500
005 20250919001733.0
008 141201s2010 maua bi 001 0 eng
020 _a9780136072249 (alk. paper)
_cRM442.00
020 _a0136072240 (alk. paper)
039 9 _a201711081558
_bhamka
_c201709081558
_didah
_c201708071544
_dros
_c201412081556
_dlan
_y12-01-2014
_zbinar
040 _aa DLC
_beng
_cDLC
_dYDX
_dBTCTA
_dYDXCP
_dBWX
_dCDX
_dOHS
_dCDN
_dMLN
_dBDX
_dOCLCF
_dOCLCO
_dOCLCQ
_dUKM
_erda
090 _aTK5105.884.C744 3
090 _aTK5105.884
_b.C744 3
100 1 _aCroft, W. Bruce.
245 1 0 _aSearch engines :
_binformation retrieval in practice /
_cW. Bruce Croft, Donald Metzler, Trevor Strohman.
264 1 _aBoston :
_bAddison-Wesley,
_c2010.
300 _axxv, 520 pages :
_billustrations ;
_c24 cm.
336 _atext
_2rdacontent
337 _aunmediated
_2rdamedia
338 _avolume
_2rdacarrier
504 _aIncludes bibliographical references : (p. [487]-511) and index.
505 0 _a1 Search Engines and Information Retrieval ... 1 1.1 What is Information Retrieval? ... 1 1.2 Search Engines ... 6 1.3 Search Engineers ... 9 1.4 Book Overview ... 10 2 Architecture of a Search Engine ... 15 2.1 What is an Architecture? ... 15 2.2 Basic Building Blocks ... 16 2.3 Breaking It Down ... 19 2.3.1 Text Acquisition ... 19 2.3.2 Text Transformation ... 21 2.3.3 Index Creation ... 24 2.3.4 User Interaction ... 25 2.3.5 Ranking ... 27 2.3.6 Evaluation ... 29 2.4 How Does It Really Work? ... 30 3 Crawls and Feeds ... 33 3.1 Deciding what to search ... 33 3.2 Crawling the Web ... 33 3.3 Directory Crawling ... 34 3.4 Document Feeds ... 34 3.5 The Conversion Problem ... 34 3.6 Storing the Documents ... 35 3.7 Detecting Duplicates ... 36 3.8 Removing Noise ... 39 4 Processing Text ... 47 4.1 From Words to Terms ... 47 4.2 Text Statistics ... 49 4.2.1 Vocabulary Growth ... 54 4.2.2 Estimating Database and Result Set Sizes ... 57 4.3 Document Parsing ... 60 4.3.1 Overview ... 60 4.3.2 Tokenizing ... 61 4.3.3 Stopping ... 64 4.3.4 Stemming ... 65 4.3.5 Phrases and N-grams ... 71 4.4 Document Structure and Markup ... 75 4.5 Link Analysis ... 78 4.5.1 Anchor Text ... 79 4.5.2 PageRank ... 79 4.5.3 Link Quality ... 85 4.6 Information Extraction ... 87 4.7 Internationalization ... 92 5 Ranking with Indexes ... 99 5.1 Overview ... 99 5.2 Abstract Model of Ranking ... 100 5.3 Inverted indexes ... 103 5.3.1 Documents ... 105 5.3.2 Counts ... 107 5.3.3 Positions ... 108 5.3.4 Fields and Extents ... 110 5.3.5 Scores ... 112 5.3.6 Ordering ... 113 5.4 Compression ... 114 5.4.1 Entropy and Ambiguity ... 116 5.4.2 Delta Encoding ... 118 5.4.3 Bit-aligned codes ... 119 5.4.4 Byte-aligned codes ... 122 5.4.5 Looking ahead ... 123 5.4.6 Skipping and Skip Pointers ... 124 5.5 Auxiliary Structures ... 126 5.6 Index Construction ... 128 5.6.1 Simple Construction ... 128 5.6.2 Merging ... 130 5.6.3 Parallelism and Distribution ... 131 5.6.4 Update ... 136 5.7 Query Processing ... 138 5.7.1 Document-at-a-time evaluation ... 138 5.7.2 Term-at-a-time evaluation ... 140 5.7.3 Optimization techniques ... 142 5.7.4 Structured queries ... 150 5.7.5 Distributed evaluation ... 152 5.7.6 Caching ... 153 6 Queries and Interfaces ... 159 6.1 Information Needs and Queries ... 159 6.2 Query Transformation and Refinement ... 162 6.2.1 Stopping and Stemming Revisited ... 162 6.2.2 Spell Checking and Suggestions ... 165 6.2.3 Query Expansion ... 171 6.2.4 Relevance Feedback ... 179 6.2.5 Context and Personalization ... 183 6.3 Showing the Results ... 186 6.3.1 Result Pages and Snippets ... 186 6.3.2 Advertising and Search ... 189 6.3.3 Clustering the Results ... 193 6.4 Cross-Language Search ... 196 7 Retrieval Models ... 205 7.1 Overview of Retrieval Models ... 205 7.1.1 Boolean Retrieval ... 207 7.1.2 The Vector Space Model ... 209 7.2 Probabilistic Models ... 215 7.2.1 Information Retrieval as Classification ... 216 7.2.2 The BM25 Ranking Algorithm ... 221 7.3 Ranking based on Language Models ... 224 7.3.1 Query Likelihood Ranking ... 226 7.3.2 Relevance Models and Pseudo-Relevance Feedback ... 232 7.4 Complex Queries and Combining Evidence ... 238 7.4.1 The Inference Network Model ... 239 7.4.2 The Galago Query Language ... 245 7.5 Web Search ... 250 7.6 Machine Learning and Information Retrieval ... 255 7.6.1 Learning to Rank ... 256 7.6.2 Topic Models and Vocabulary Mismatch ... 259 7.7 Application-Based Models ... 262 8 Evaluating Search Engines ... 269 8.1 Why Evaluate? ... 269 8.2 The Evaluation Corpus ... 271 8.3 Logging ... 277 8.4 Effectiveness Metrics ... 280 8.4.1 Recall and Precision ... 280 8.4.2 Averaging and Interpolation ... 285 8.4.3 Focusing On The Top Documents ... 290 8.4.4 Using Preferences ... 293 8.5 Efficiency Metrics ... 294 8.6 Training, Testing, and Statistics ... 297 8.6.1 Significance Tests ... 297 8.6.2 Setting Parameter Values ... 302 8.7 The Bottom Line ... 304 9 Classification and Clustering ... 309 9.1 Classification and Categorization ... 310 9.1.1 Naive Bayes ... 312 9.1.2 Support Vector Machines ... 320 9.1.3 Evaluation ... 328 9.1.4 Classifier and Feature Selection ... 329 9.1.5 Spam, Sentiment, and Online Advertising ... 333 9.2 Clustering ... 343 9.2.1 Hierarchical and K-Means Clustering ... 344 9.2.2 K Nearest Neighbor Clustering ... 354 9.2.3 Evaluation ... 356 9.2.4 How to Choose K ... 357 9.2.5 Clustering and Search ... 359 10 Social Search ... 365 10.1 What is Social Search? ... 365 10.2 User Tags and Manual Indexing ... 366 10.3 Searching With Communities ... 366 10.4 Filtering and Recommending ... 366 10.4.1 Document Filtering ... 366 10.4.2 Collaborative Filtering ... 375 10.5 Personalization ... 380 10.6 Peer-to-Peer and Metasearch ... 380 10.6.1 Distributed search ... 380 10.6.2 P2P Networks ... 384 11 Beyond Bag of Words ... 391 11.1 Overview ... 391 11.2 Feature-Based Retrieval Models ... 392 11.3 Term Dependence Models ... 394 11.4 Structure Revisited ... 399 11.4.1 XML Retrieval ... 401 11.5 Longer Questions, Better Answers ... 404 11.6 Words, Pictures, and Music ... 408 11.7 One Search Fits All? ... 417 References ... 423 Index ... 445.
520 1 _a'Search Engines: Information Retrieval in Practice introduces the key issues in information retrieval (IR) and shows how they affect the design and implementation of search engines, with mathematical models reinforcing important concepts. This book is ideal for an introductory course on IR at either the undergraduate or master's level or for professionals seeking an authoritative introduction. An extensive set of resources is available to instructors.'--BOOK JACKET.
650 0 _aSearch engines
_xProgramming.
650 0 _aInformation retrieval.
700 1 _aMetzler, Donald.
700 1 _aStrohman, Trevor.
907 _a.b16036244
_b2019-11-12
_c2019-11-12
942 _c01
_n0
_kTK5105.884.C744 3
914 _avtls003574468
990 _arab
990 _ans
991 _aFakulti Teknologi dan Sains Maklumat
998 _al
_b2014-01-12
_cm
_da
_feng
_gmau
_y0
_z.b16036244
999 _c582842
_d582842