最近需要做一些文本提取的工作,发现了Tika,能够和Lucene,Solr,Nutch,Mahout等整合,功能十分强大。同时也发现了相关的图书《Tika in Action》。
在浏览LinkedIn的Zoie时又发现一本《Lucene in Depth: Advanced Search Techniques with Lucene 》,希望不要像《Nutch in Action》那样半途而废。
这样一来,有关Lucene相关项目的图书就很全了,我现在已经有以下图书:
打印版:
《Lucene in Action》1st,2nd
《Building Search Applications: Lucene,LingPipe,and Gate》
《Solr 1.4 Enterprise Search Server》
《Lucidworks for Solr Certified Distribution Reference Guide version 1.4》
图书版:
《Introduction to Information Retrieval》(影印版)
《自己动手写搜索引擎》
欲购中:
《Mining the Web: Discovering Knowledge from Hypertext Data 》(影印版)
《The Text Mining Handbook: Advanced Approaches in Analyzing Unstructured Data》(影印版)
《Elements of Information Theory》2nd(打印版)
《Foundations of Statistical Natural Language Processing》(打印版)
《Speech and Language Processing: An Introduction to Natural Language Processing, Computational Linguistics, and Speech Recognition》2nd(打印版)
等待中:
《Tika in Action》
《Lucene in Depth: Advanced Search Techniques with Lucene 》