<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>Schotime.net &#187; Regex</title>
	<atom:link href="http://schotime.net/blog/index.php/tag/regex/feed/" rel="self" type="application/rss+xml" />
	<link>http://schotime.net/blog</link>
	<description>All Things .Net and Me</description>
	<lastBuildDate>Thu, 01 Jul 2010 14:42:28 +0000</lastBuildDate>
	<generator>http://wordpress.org/?v=2.9.1</generator>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
			<item>
		<title>NOSQL, NoRM (mongoDB) and Regular Expressions</title>
		<link>http://schotime.net/blog/index.php/2010/04/29/nosql-norm-mongodb-and-regular-expressions/</link>
		<comments>http://schotime.net/blog/index.php/2010/04/29/nosql-norm-mongodb-and-regular-expressions/#comments</comments>
		<pubDate>Thu, 29 Apr 2010 07:57:29 +0000</pubDate>
		<dc:creator>Schotime</dc:creator>
				<category><![CDATA[.NET]]></category>
		<category><![CDATA[C#]]></category>
		<category><![CDATA[Linq]]></category>
		<category><![CDATA[mongodb]]></category>
		<category><![CDATA[NoRM]]></category>
		<category><![CDATA[Regex]]></category>

		<guid isPermaLink="false">http://schotime.net/blog/index.php/2010/04/29/nosql-norm-mongodb-and-regular-expressions/</guid>
		<description><![CDATA[Over the past few weeks I have got quite involved in the development of the Open Source NoRM project started by Andrew Theken which is a MongoDB driver for C#. In particular I have been refactoring and adding new functionality to the LINQ provider.
It currently supports a lot of functionality including deep queries, regex, datetime [...]]]></description>
			<content:encoded><![CDATA[<p>Over the past few weeks I have got quite involved in the development of the Open Source <a href="http://github.com/atheken/NoRM" onclick="pageTracker._trackPageview('/outgoing/github.com/atheken/NoRM?referer=');">NoRM</a> project started by <a href="http://andrewtheken.com/" onclick="pageTracker._trackPageview('/outgoing/andrewtheken.com/?referer=');">Andrew Theken</a> which is a <a href="http://www.mongodb.org/" onclick="pageTracker._trackPageview('/outgoing/www.mongodb.org/?referer=');">MongoDB</a> driver for C#. In particular I have been refactoring and adding new functionality to the LINQ provider.</p>
<p>It currently supports a lot of functionality including deep queries, regex, datetime which is really exciting. In this post though I am going to concentrate on regular expressions.</p>
<p>This is the newest part of the Linq provider however it is probably the most powerful, especially for complex queries. The reason for this is that MongoDB will use the indexes created when a regular expression is used (where the query is not a complex query). A complex query is one that filters on the same property twice, uses a string function (replace/substring/toLower etc) or does some other fancy stuff. For example using the toUpper() method.</p>
<table style="background: black;" border="1" cellspacing="0" cellpadding="2" width="400">
<tbody>
<tr>
<td width="400" valign="top">
<pre class="code"><span style="background: black; color: #cc7832;">var </span><span style="background: black; color: white;">products = session.Products.Where(x =&gt; x.Name.ToUpper() == </span><span style="background: black; color: #a5c25c;">"TEST3"</span><span style="background: black; color: white;">).ToList();</span></pre>
</td>
</tr>
</tbody>
</table>
<p>Anyways….i digress. So, here is an example of using a Regex in a Linq Query. Pretty simple.</p>
<table style="background: black;" border="1" cellspacing="0" cellpadding="2" width="400">
<tbody>
<tr>
<td width="400" valign="top">
<pre class="code"><span style="background: black; color: #cc7832;">var </span><span style="background: black; color: white;">products = session.Products.Where(p =&gt; </span><span style="background: black; color: #ffc66d;">Regex</span><span style="background: black; color: white;">.IsMatch(p.Name, </span><span style="background: black; color: #a5c25c;">"^te"</span><span style="background: black; color: white;">)).ToList();</span></pre>
</td>
</tr>
</tbody>
</table>
<p>Using the static Regex.IsMatch is the only way to invoke a regex call using the Linq Provider. This will however run blazingly fast. I tested this query on 1,000,000 Products and it only took 1.5sec, which was approximately 10x faster than when a complex query is invoked. There are however 3 string functions that have been optimized using this regex functionality. They are StartsWith(), EndsWith() and Contains() which is why the following query only takes 2secs to return over 48,000 rows, however when using Skip() and Take() you can get 50 results back in just milliseconds.</p>
<table style="background: black;" border="1" cellspacing="0" cellpadding="2" width="400">
<tbody>
<tr>
<td width="400" valign="top">
<pre class="code"><span style="background: black; color: #cc7832;">var </span><span style="background: black; color: white;">products = session.Products.Where(x =&gt; x.Name.StartsWith(</span><span style="background: black; color: #a5c25c;">"X"</span><span style="background: black; color: white;">)).ToList();</span></pre>
</td>
</tr>
</tbody>
</table>
<p>The Linq provider also supports 3 of the RegexOptions. They are RegexOptions.IgnoreCase (but please note this will not use the index so will be slower), RegexOptions.Multiline and RegexOptions.None. Regex’s can also be used in conjunction with other filters. eg.</p>
<table style="background: black;" border="1" cellspacing="0" cellpadding="2" width="400">
<tbody>
<tr>
<td width="400" valign="top"><span style="background: black; color: #cc7832;"></p>
<pre class="code">var <span style="background: black; color: white;">products = session.Products.Where(p =&gt; </span><span style="background: black; color: #ffc66d;">Regex</span><span style="background: black; color: white;">.IsMatch(p.Name, </span><span style="background: black; color: #a5c25c;">"^te"</span><span style="background: black; color: white;">) &amp;&amp; p.Price == </span><span style="background: black; color: #6897bb;">10</span><span style="background: black; color: white;">).ToList();</span></pre>
<p></span></td>
</tr>
</tbody>
</table>
<p>This query is not considered a complex query because two different properties are used and an “and”(&amp;&amp;) operator is used.</p>
<p><strong><em>Please note:</em></strong> Any time a “or” (||) operator is used, it will be considered a complex query.</p>
<p><strong>Summary:</strong></p>
<p><strong> </strong></p>
<p>If you have a filter than can be written as a regex, chances are it will be as fast or faster than without using a regex.</p>
<p>So please go and try out NoRM and enjoy the freedom. I will try and post some more cool stuff in the LINQ provider over the next few weeks. Stay tuned.</p>
]]></content:encoded>
			<wfw:commentRss>http://schotime.net/blog/index.php/2010/04/29/nosql-norm-mongodb-and-regular-expressions/feed/</wfw:commentRss>
		<slash:comments>1</slash:comments>
		</item>
		<item>
		<title>Linq and Regular Expressions</title>
		<link>http://schotime.net/blog/index.php/2008/03/10/linq-and-regular-expressions/</link>
		<comments>http://schotime.net/blog/index.php/2008/03/10/linq-and-regular-expressions/#comments</comments>
		<pubDate>Mon, 10 Mar 2008 04:40:23 +0000</pubDate>
		<dc:creator>Schotime</dc:creator>
				<category><![CDATA[.NET]]></category>
		<category><![CDATA[C#]]></category>
		<category><![CDATA[Linq]]></category>
		<category><![CDATA[Regex]]></category>

		<guid isPermaLink="false">http://dev.schotime.net/blog/index.php/2008/03/10/linq-and-regular-expressions/</guid>
		<description><![CDATA[With Linq now standard in .NET 3.5, there is no reason why we shouldn&#8217;t use it. After all its full of features that can be used by any object that inherits the type IEnumberable. With such power at our fingertips, sorting, filtering, manipulation etc. etc. are available to us with fewer lines of code than [...]]]></description>
			<content:encoded><![CDATA[<p>With Linq now standard in .NET 3.5, there is no reason why we shouldn&#8217;t use it. After all its full of features that can be used by any object that inherits the type IEnumberable. With such power at our fingertips, sorting, filtering, manipulation etc. etc. are available to us with fewer lines of code than previous needed.</p>
<p>One powerful feature of programing is Regular Expressions. These provide a concise and flexible means for identifying text of interest, such as particular characters, words, or patterns of characters. So whilst going over some old code of mine to extract data from a remote website, I decided to give the Regular Expression part of my code a face lift with Linq.</p>
<p>The code below is the setup code just to give some background.</p>
<table style="border-collapse: collapse" cellspacing="0" cellpadding="2" width="463" border="1">
<tbody>
<tr>
<td valign="top" width="461">
<pre class="code"><span style="color: #2b91af">String </span>StringToMatch = <span style="color: #a31515">&quot;&lt;tr class=\&quot;ar1\&quot;&gt;&lt;td&gt;456642&lt;/td&gt;&quot;
        </span>+ <span style="color: #a31515">&quot;&lt;td class=\&quot;left\&quot;&gt;John&lt;/td&gt;&quot;
        </span>+ <span style="color: #a31515">&quot;&lt;td class=\&quot;left\&quot;&gt;Smith&lt;/td&gt;&quot;
        </span>+ <span style="color: #a31515">&quot;&lt;td&gt;j.smith@email.com&lt;/td&gt;&lt;/tr&gt;&quot;
        </span>+ <span style="color: #a31515">&quot;&lt;tr class=\&quot;ar1\&quot;&gt;&lt;td&gt;456643&lt;/td&gt;&quot;
        </span>+ <span style="color: #a31515">&quot;&lt;td class=\&quot;left\&quot;&gt;Edward&lt;/td&gt;&quot;
        </span>+ <span style="color: #a31515">&quot;&lt;td class=\&quot;left\&quot;&gt;Norman&lt;/td&gt;&quot;
        </span>+ <span style="color: #a31515">&quot;&lt;td&gt;e.norman@email.com&lt;/td&gt;&lt;/tr&gt;&quot;</span>;

<span style="color: #2b91af">Regex </span>r = <span style="color: blue">new </span><span style="color: #2b91af">Regex</span>(<span style="color: #a31515">&quot;&lt;tr class=\&quot;(?:ar1|ar2)\&quot;&gt;&lt;td&gt;([0-9]+)&lt;/td&gt;&quot;
        </span>+ <span style="color: #a31515">&quot;&lt;td class=\&quot;left\&quot;&gt;(.*?)&lt;/td&gt;&quot;
        </span>+ <span style="color: #a31515">&quot;&lt;td class=\&quot;left\&quot;&gt;(.*?)&lt;/td&gt;&quot;
        </span>+ <span style="color: #a31515">&quot;&lt;td&gt;(.*?)&lt;/td&gt;&lt;/tr&gt;&quot;</span>);

<span style="color: #2b91af">MatchCollection </span>matches = r.Matches(StringToMatch);</pre>
<p><a href="http://11011.net/software/vspaste" onclick="pageTracker._trackPageview('/outgoing/11011.net/software/vspaste?referer=');"></a></p>
</td>
</tr>
</tbody>
</table>
<p>The following code is the preLinq version of the code to process the Regular Expression.</p>
<table style="border-collapse: collapse" cellspacing="0" cellpadding="2" width="400" border="1">
<tbody>
<tr>
<td valign="top" width="400">
<pre class="code"><span style="color: #2b91af">List</span>&lt;<span style="color: #2b91af">Profile</span>&gt; Profiles = <span style="color: blue">new </span><span style="color: #2b91af">List</span>&lt;<span style="color: #2b91af">Profile</span>&gt;();

<span style="color: blue">if </span>(matches.Count &gt; 0)
{
    <span style="color: blue">foreach </span>(<span style="color: #2b91af">Match </span>m <span style="color: blue">in </span>matches)
    {
        <span style="color: #2b91af">Profile </span>p = <span style="color: blue">new </span><span style="color: #2b91af">Profile</span>();

        p.Id = m.Groups[1].Value;
        p.Firstname = m.Groups[2].Value;
        p.Lastname = m.Groups[3].Value;
        p.Email = m.Groups[4].Value;

        Profiles.Add(p);
    }
}</pre>
</td>
</tr>
</tbody>
</table>
<p>As you can see above, a strongly typed List of type Profile is created. Then we loop through each match, first creating a new instance of the Profile object. Filling the object up with the results from our Regular Expression and finally adding it to the list. Whilst this code is pretty straight forward, look how easily Linq handles this scenario.</p>
<table style="border-collapse: collapse" cellspacing="0" cellpadding="2" width="400" border="1">
<tbody>
<tr>
<td valign="top" width="400">
<pre class="code"><span style="color: blue">if </span>(matches.Count &gt; 0)
{
     <span style="color: #2b91af">List</span>&lt;<span style="color: #2b91af">Profile</span>&gt; Profiles = (<span style="color: blue">from </span><span style="color: #2b91af">Match </span>m <span style="color: blue">in </span>matches
                              <span style="color: blue">select new </span><span style="color: #2b91af">Profile
                              </span>{
                                  Id = m.Groups[1].Value,
                                  Firstname = m.Groups[2].Value,
                                  Lastname = m.Groups[3].Value,
                                  Email = m.Groups[4].Value
                              }).ToList();
}</pre>
</td>
</tr>
</tbody>
</table>
<p>As you can see above, we have managed to reduced the amount of statements from around 8 to 1. So what this is doing in english is creating a strongly typed List of the type Profile and using Linq to fill it. It states that for ever Match m in the list matches, create a new object Profile and auto initialise the variables with the values contained in the match. Finally we convert the IEnumberable&lt;Profile&gt; result to a List&lt;Profile&gt; by using the method ToList().</p>
<p>How easy was that! Now say you wanted the list of Profile&#8217;s sorted by lastname. Well you would normally have to build the list as above and then call the Sort method using a defined Comparison object. This is where Linq becomes even more powerful. Simply by adding one line to the Linq statement above, the List generated will be sorted by lastname.</p>
<table style="border-collapse: collapse" cellspacing="0" cellpadding="2" width="400" border="1">
<tbody>
<tr>
<td valign="top" width="400">
<pre class="code"><span style="color: blue">if </span>(matches.Count &gt; 0)
{
     <span style="color: #2b91af">List</span>&lt;<span style="color: #2b91af">Profile</span>&gt; Profiles = (<span style="color: blue">from </span><span style="color: #2b91af">Match </span>m <span style="color: blue">in </span>matches
                       <strong>---&gt;</strong>   <span style="color: blue">orderby </span>m.Groups[3].Value
                              <span style="color: blue">select new </span><span style="color: #2b91af">Profile
                              </span>{
                                  Id = m.Groups[1].Value,
                                  Firstname = m.Groups[2].Value,
                                  Lastname = m.Groups[3].Value,
                                  Email = m.Groups[4].Value
                              }).ToList();
}</pre>
</td>
</tr>
</tbody>
</table>
<p>Also another point to add regarding the definition of class Profile. Back in .NET 2.0 days creating a class was pretty painful. A lot of repeated code just to get and object with some variables.</p>
<table style="border-collapse: collapse" cellspacing="0" cellpadding="2" width="400" border="1">
<tbody>
<tr>
<td valign="top" width="400">
<pre class="code"><span style="color: blue">public class </span><span style="color: #2b91af">Profile
</span>{
    <span style="color: blue">private string </span>_id;
    <span style="color: blue">private string </span>_firstname;
    <span style="color: blue">private string </span>_lastname;
    <span style="color: blue">private string </span>_email;

    <span style="color: blue">public string </span>Id
    {
        <span style="color: blue">get </span>{ <span style="color: blue">return </span>_id; }
        <span style="color: blue">set </span>{ _id = <span style="color: blue">value</span>; }
    }

    <span style="color: blue">public string </span>Firstname
    {
        <span style="color: blue">get </span>{ <span style="color: blue">return </span>_firstname; }
        <span style="color: blue">set </span>{ _firstname = <span style="color: blue">value</span>; }
    }

    <span style="color: blue">public string </span>Lastname
    {
        <span style="color: blue">get </span>{ <span style="color: blue">return </span>_lastname; }
        <span style="color: blue">set </span>{ _lastname = <span style="color: blue">value</span>; }
    }

    <span style="color: blue">public string </span>Email
    {
        <span style="color: blue">get </span>{ <span style="color: blue">return </span>_email; }
        <span style="color: blue">set </span>{ _email = <span style="color: blue">value</span>; }
    }
}</pre>
</td>
</tr>
</tbody>
</table>
<p>As you can see its way to long. Lets see how post .NET 2.0 does it.</p>
<table style="border-collapse: collapse" cellspacing="0" cellpadding="2" width="400" border="1">
<tbody>
<tr>
<td valign="top" width="400">
<pre class="code"><span style="color: blue">public class </span><span style="color: #2b91af">Profile
</span>{
    <span style="color: blue">public string </span>Id { <span style="color: blue">get</span>; <span style="color: blue">set</span>; }
    <span style="color: blue">public string </span>Firstname { <span style="color: blue">get</span>; <span style="color: blue">set</span>; }
    <span style="color: blue">public string </span>Lastname { <span style="color: blue">get</span>; <span style="color: blue">set</span>; }
    <span style="color: blue">public string </span>Email { <span style="color: blue">get</span>; <span style="color: blue">set</span>; }
}</pre>
</td>
</tr>
</tbody>
</table>
<p>Now thats what i&#8217;m talking about. Good work team. Thats how easy it should be to create a class!</p>
<p>Til&#8217; Next Time, It&#8217;s Schotime Out!</p>
]]></content:encoded>
			<wfw:commentRss>http://schotime.net/blog/index.php/2008/03/10/linq-and-regular-expressions/feed/</wfw:commentRss>
		<slash:comments>5</slash:comments>
		</item>
	</channel>
</rss>
