Monthly Archives: April 2010
Over the past few weeks I have got quite involved in the development of the Open Source NoRM project started by Andrew Theken which is a MongoDB driver for C#. In particular I have been refactoring and adding new functionality to the LINQ provider.
It currently supports a lot of functionality including deep queries, regex, datetime which is really exciting. In this post though I am going to concentrate on regular expressions.
This is the newest part of the Linq provider however it is probably the most powerful, especially for complex queries. The reason for this is that MongoDB will use the indexes created when a regular expression is used (where the query is not a complex query). A complex query is one that filters on the same property twice, uses a string function (replace/substring/toLower etc) or does some other fancy stuff. For example using the toUpper() method.
var products = session.Products.Where(x => x.Name.ToUpper() == "TEST3").ToList();
Anyways….i digress. So, here is an example of using a Regex in a Linq Query. Pretty simple.
var products = session.Products.Where(p => Regex.IsMatch(p.Name, "^te")).ToList();
Using the static Regex.IsMatch is the only way to invoke a regex call using the Linq Provider. This will however run blazingly fast. I tested this query on 1,000,000 Products and it only took 1.5sec, which was approximately 10x faster than when a complex query is invoked. There are however 3 string functions that have been optimized using this regex functionality. They are StartsWith(), EndsWith() and Contains() which is why the following query only takes 2secs to return over 48,000 rows, however when using Skip() and Take() you can get 50 results back in just milliseconds.
var products = session.Products.Where(x => x.Name.StartsWith("X")).ToList();
The Linq provider also supports 3 of the RegexOptions. They are RegexOptions.IgnoreCase (but please note this will not use the index so will be slower), RegexOptions.Multiline and RegexOptions.None. Regex’s can also be used in conjunction with other filters. eg.
var products = session.Products.Where(p => Regex.IsMatch(p.Name, "^te") && p.Price == 10).ToList();
This query is not considered a complex query because two different properties are used and an “and”(&&) operator is used.
Please note: Any time a “or” (||) operator is used, it will be considered a complex query.
If you have a filter than can be written as a regex, chances are it will be as fast or faster than without using a regex.
So please go and try out NoRM and enjoy the freedom. I will try and post some more cool stuff in the LINQ provider over the next few weeks. Stay tuned.