Apache Solr is a search server focused on full-text search, relevancy, and performance. It builds on the Apache Lucene search library.
Apache Lucene is a high-performance, full-featured text search engine library written entirely in Java. It is a technology suitable for nearly any application that requires full-text search, especially cross-platform.

Nutch is web search software. It builds on the Apache Lucene search library, adding a crawler, web database (including full link graph), plugins for various document formats, user interface, etc. Nutch home is here.
Apache Tika is a toolkit for detecting and extracting metadata and structured text content from various documents using existing parser libraries.
The Apache Hadoop project develops open-source software for reliable, scalable, distributed computing.
Hadoop includes these subprojects: Hadoop Common (archive), Avro (archive), Chukwa (archive), HBase (archive), HDFS, Hive, MapReduce, Pig and ZooKeeper.
Lucy will be a loose C port of the Java Lucene search engine library, with Perl and Ruby bindings.
The general@lucene mailing list is for discussions about the top level Lucene Apache project and matters that affect all subprojects.  It is also a suitable place to ask questions when you aren't sure which sub project would be most useful for addressing a particular problem or use case.

Mahout's goal is to build scalable, Apache licensed machine learning libraries. Initially, we are interested in building out the ten machine learning libraries detailed in nips06-mapreducemulticore.pdf using Hadoop. While these algorithms are our initial focus, we welcome contributions of other machine learning approaches.

