<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>Jason Rowe &#187; Lucene.Net</title>
	<atom:link href="http://jasonrowe.com/tag/lucene-net/feed/" rel="self" type="application/rss+xml" />
	<link>http://jasonrowe.com</link>
	<description>enjoying the web</description>
	<lastBuildDate>Sun, 13 May 2012 14:05:00 +0000</lastBuildDate>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.org/?v=</generator>
		<item>
		<title>Lucene in Action .Net Samples</title>
		<link>http://jasonrowe.com/2010/06/23/lucene-in-action-net-samples/?utm_source=rss&#038;utm_medium=rss&#038;utm_campaign=lucene-in-action-net-samples</link>
		<comments>http://jasonrowe.com/2010/06/23/lucene-in-action-net-samples/#comments</comments>
		<pubDate>Thu, 24 Jun 2010 03:22:39 +0000</pubDate>
		<dc:creator>Jason</dc:creator>
				<category><![CDATA[.Net]]></category>
		<category><![CDATA[Lucene.Net]]></category>

		<guid isPermaLink="false">http://jasonrowe.com/?p=1185</guid>
		<description><![CDATA[&#160; I started reading Lucene in Action by Otis Gospodnetic´ and Erik Hatcher.&#160; Lucene is a high performance, scalable Information Retrieval (IR) library. The library is in Java but I’m using the book to understand the Lucene .Net port. Why &#8230; <a href="http://jasonrowe.com/2010/06/23/lucene-in-action-net-samples/">Continue reading <span class="meta-nav">&#8594;</span></a>]]></description>
			<content:encoded><![CDATA[<p>&#160;</p>  <p>I started reading <a href="http://www.manning.com/hatcher3/">Lucene in Action</a> by Otis Gospodnetic´ and Erik Hatcher.&#160; Lucene is a high performance, scalable Information Retrieval (IR) library. The library is in Java but I’m using the book to understand the <a href="http://lucene.apache.org/lucene.net/">Lucene .Net</a> port.</p>  

<p>Why buy a Java book to learn a .net Library? I couldn’t find a book specifically for the .Net version. Also, I wanted a 1000 foot level view of search technologies and because this is my first time working with IR. I was pleased to find the .net version is akin to the Java version presented in the book.</p>  

<p>In some ways, learning from a different language has been beneficial. I invested more time converting the presented samples to C#.  I’ve also read <a href="http://www.amazon.com/Mathematics-Physics-Programmers-Game-Development/dp/1584503300">Mathematics and Physics for Programmers</a> which was completely written in a pseudo language. Again, lots of fun to convert the presented samples to a language I was more comfortable with.</p> 

<p>It was also nice to find that the open source project <a href="http://code.google.com/p/subtext/source/browse/trunk/src/Subtext.Framework/Services/SearchEngine/SearchEngineService.cs">Subtext</a> is using Lucene .Net. They’ve worked out some locking issue recently and seem to be getting things setup in a nice way. Also, my co workers Kevin and Tim put together a nice library for Lucene .Net called <a href="http://code.google.com/p/activelucenenet/">ActiveLucene.Net &#8211; Attributed Lucene.Net</a>. The largest project seems to be <a href="http://linqtolucene.codeplex.com/">Linq to Lucene</a> but I haven&#8217;t looked at it yet. It was helpful to see how others are integrating Lucene into the Microsoft Web Platform.

</p><p>
Anyway, here are the indexer and searcher in my C# interpretation. These are the first snippets presented in the book. These are a great way to get a jump start into understanding the basics.
</p>

<p><strong>Indexer</strong></p> 

<pre class="brush: csharp; title: ; notranslate">   
readonly static SimpleFSLockFactory _LockFactory = new SimpleFSLockFactory();

static void Main(string[] args)
{

    var dataPath = ConfigurationManager.AppSettings[&quot;DataDirectory&quot;];

    var indexPath = ConfigurationManager.AppSettings[&quot;IndexDirectory&quot;];

    if (!System.IO.Directory.Exists(dataPath))
    {
        throw new IOException(dataPath + &quot; directory does not exist&quot;);
    }

    DirectoryInfo indexInfo = new DirectoryInfo(indexPath);

    DirectoryInfo dataInfo = new DirectoryInfo(dataPath);

    Lucene.Net.Store.Directory indexDir = Lucene.Net.Store.FSDirectory.Open(
                                                     indexInfo, _LockFactory);

    var start = DateTime.Now.TimeOfDay;
    var numIndexed = Index(indexDir, dataInfo);
    var end = DateTime.Now.TimeOfDay;

    var delta = end.TotalMilliseconds - start.TotalMilliseconds;

    Console.WriteLine(
       &quot;Indexing &quot; + numIndexed + &quot; files took &quot; + delta.ToString() + &quot; milliseconds
                      );
}

public static int Index(Lucene.Net.Store.Directory indexDir, DirectoryInfo dataInfo)
{
    var writer = new IndexWriter(indexDir, new StandardAnalyzer(
                                      Lucene.Net.Util.Version.LUCENE_29), 
                                      IndexWriter.MaxFieldLength.UNLIMITED);
    writer.SetMergePolicy(new LogDocMergePolicy(writer));
    writer.SetMergeFactor(5);

    try
    {
        var paths = dataInfo.EnumerateFiles(&quot;*.txt&quot;);

        foreach (var path in paths)
        {
            IndexFile(writer, path);
        }
    }
    catch (Exception ex)
    {
        writer.Close();
        throw ex;
    }

    var numIndexed = writer.MaxDoc();
    writer.Optimize();
    writer.Close();

    return numIndexed;
}

private static void IndexFile(IndexWriter writer, FileInfo file)
{
    if (!file.Exists)
    {
        return;
    }

    Console.WriteLine(&quot;Indexing &quot; + file.Name);

    Document doc = new Document();

    var path = file.FullName;

    System.IO.TextReader readFile = new StreamReader(path);

    doc.Add(new Field(&quot;contents&quot;, readFile));

    doc.Add(new Field(&quot;filename&quot;, file.Name,
        Field.Store.YES,
        Field.Index.ANALYZED,
        Field.TermVector.YES));

    writer.AddDocument(doc);
}
</pre>   

<p><strong>Searcher</strong></p> 

<pre class="brush: csharp; title: ; notranslate">   
public static SimpleFSLockFactory _LockFactory = new SimpleFSLockFactory();

static void Main(string[] args)
{
    if (args.Length &lt; 2)
    {
        Console.WriteLine(&quot;Searcher takes two parameters&quot;);
        Console.WriteLine(&quot;Usage: ConsoleSearcher &lt;index dir&gt; &lt;query&gt;&quot;);
    }

    var indexInfo = new DirectoryInfo(args[0]);
    var query = args[1];

    if (!System.IO.Directory.Exists(args[0]))
    {
        throw new IOException(args[0] + &quot; directory does not exist&quot;);
    }

    SearchOption(indexInfo, query);
}

private static void SearchOption(DirectoryInfo indexInfo, string query)
{
    Lucene.Net.Store.Directory indexDir = Lucene.Net.Store.FSDirectory.Open(
                                                     indexInfo, _LockFactory);

    IndexSearcher indexSearcher = new IndexSearcher(indexDir, true);

    QueryParser parser = BuildQueryParser();
    var luceneQuery = parser.Parse(query);

    var start = DateTime.Now.TimeOfDay;

    var hits = indexSearcher.Search(luceneQuery);
    var end = DateTime.Now.TimeOfDay;

    var delta = end.TotalMilliseconds - start.TotalMilliseconds;

    Console.WriteLine(&quot;Found &quot; + hits.Length() + 
        &quot; document (s) (in &quot; + delta.ToString() + 
        &quot; milliseconds) that matched query '&quot; + query + &quot;':&quot;);

    for (int i = 0; i &lt; hits.Length(); i++)
    {
        Document doc = hits.Doc(i);

        Console.WriteLine(doc.Get(&quot;filename&quot;));
    }
}

private static QueryParser BuildQueryParser()
{
var parser = new QueryParser(
    Lucene.Net.Util.Version.LUCENE_29, &quot;contents&quot;, 
     new StandardAnalyzer(Lucene.Net.Util.Version.LUCENE_29));

    parser.SetDefaultOperator(QueryParser.Operator.AND);
    return parser;
}
</pre>   
</query></query>]]></content:encoded>
			<wfw:commentRss>http://jasonrowe.com/2010/06/23/lucene-in-action-net-samples/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
	</channel>
</rss>

