C1, LINQ & Performance what to know

Topics: General
Oct 13, 2011 at 2:19 PM
Edited Oct 14, 2011 at 10:59 AM

2 years back I posted on our old forum a small article about optimizing LINQ queries. Even though the article is quite old, (and my english was even worse back then lol), I still think it may improve your understanding of that particular technology and help you to create more scalable C1 solutions. 

..................

Writing a highly scalable/responsive website takes a lot of work, it includes optimizing hosting infrastructure (multi-server, geodependent, content delivery networks etc.),  page speed/javascript optimizations, and optimizing the actual website performance. 

If a page on your site take about 1 second to load, you have to admit, there’s something wrong with it. Of course sometimes it’s rather difficult to achieve 20ms rendering time, but in most of the cases, just some basic code optimization tricks allow you to achieve acceptable 50-70ms, which would be enough to serve the page to 50-100 concurrent users.

The algorithm of performance opimization is a simple as it gets: find a "bottleneck" / non-optimal code and fix it, repeat until pleased with the result.

The first part - finding bottlenecks is achievable by using performance profiles, you can you either build in C1 page profiler (article coming soon), or one of the .NET profilers out there, I tried out quite a few, and recommend Drone Profiler or JetBrains dotTrace

The second part - fixing it, in most of the cases the problem code is in the way you query/process data, and the article below describes does LINQ work, and on what circumstances depends its execution time.

 

Composite C1 allows as choosing between two different ways of storing data – XML and SQL. 

 

Part 1. Introduction, XML data provider.

To start with, let's look at the following query:

 

	public static string GetPageTitle(Guid pageId)
        {
            IQueryable<IPage> pages = DataFacade.GetData<IPage>();

            return (from page in pages
                    where page.Id == pageId
                    select page.Title).FirstOrDefault();
        }

 

Xml data provider works rather simple – the very first time the code is executed, it  will be reading “IPage” collection from the related xml file (/App_Data/Composite/DataStores/*.xml), and all the queries will be executed over an in-memory collection of those pages (System.Linq.EnumerableQuery<T> class). 


Execution time of our GetPageTitle() method can be estimated as:

1)  Getting pages from XML file (<5ms, and only for the first call)

2)  Compiling LINQ – 7ms

3)  Executing query < 1ms

So we have 7ms of execution time, even if we have only one page in the system. In order to make the execution time shorter, we can use IEnumerable<> interface, instead of IQueryable<>:

 

        public static string GetPageTitle2(Guid pageId)
        {
            // Using IEnumerable<>
            IEnumerable<IPage> pages = DataFacade.GetData<IPage>();

            return (from page in pages
                    where page.Id == pageId
                    select page.Title).FirstOrDefault();
        }

 

 

The new code looks pretty much the same, but if you take a look to the generated code though .NET decompiler (f.e. Reflector  or JustDecompile), you will see, that in the first variant, we’re building a small expression tree, which requires compilation. In the second case there’re some method calls over IEnumerable<IPage> object, which are executed rather quick.

 

public static string GetPageTitle(Guid pageId)
{
    ParameterExpression CS$0$0001;
    return DataFacade.GetData<IPage>()
  .Where<IPage>(
                Expression.Lambda<Func<IPage, bool>>(
                        Expression.Equal( 
                             Expression.Property(CS$0$0001 = Expression.Parameter(typeof(IPage), "page"), 
                                                            (MethodInfo)   methodof(IPage.get_Id)), 
                             Expression.Constant(pageId), 
                false, 
               (MethodInfo) methodof(Guid.op_Equality))
   , new ParameterExpression[] { CS$0$0001 }))
.Select<IPage, string>(
  Expression.Lambda<Func<IPage, string>>(
    Expression.Property(CS$0$0001 = Expression.Parameter(typeof(IPage), "page"), 
   (MethodInfo) methodof(IPage.get_Title)),
    new ParameterExpression[] { CS$0$0001 })).FirstOrDefault<string>();
}
public static string GetPageTitle2(Guid pageId)
{
    return DataFacade.GetData<IPage>()
     .Where<IPage>(delegate (IPage page)
         {  return (page.Id == pageId);   })
     .Select<IPage, string>(delegate (IPage page) {
        return page.Title;   })
     .FirstOrDefault<string>();
}

 

And here, we can do a small summary.

TIP 1: It’s better to use IEnumerable<> instead of  IQueryable<> interface while working with in-memory kept data, f.e. List<T>() object and data get from the xml data provider.

 

The otherthing that is nice-to-know while working EnumerableQuery<> (Xml data provider) is that it is too easy to get a bad execution complexity. If you didn’t have a pleasure of studying “Analysis of algorithms” in university, I will show a small example:

 

var table = 
(from product in DataFacade.Get<IProduct>()
 from productCategory in DataFacade.Get<IProductCategory>()
 where product.Id = productCategory.ProductId
 select product.Id, product.Name, productCategory.CategoryName).ToList();

 

Everything seems to be right, and it would be right for Sql data provider. But while using xml, it will have O(n*n) complexity, which means, if you have 10.000 products, it will take 100.000.000 operations  (in this case it is comaring  product.Id and productCategory.ProductId) - the query may take minutes to execute. If you aren’t yet familiar with what O(n*n) means, I would recommend you this link: http://en.wikipedia.org/wiki/Analysis_of_algorithms

In the current example the solution would be to use a hashtable. The optimized code looks like:

 

var categoryById = new Composite.Collections.Generic.Hashtable<Guid, IProductCategory>();
foreach(var category in DataFacade.Get<IProductCategory>()) {
  categoryById.Add(category.Id, category);
}

var table = (from product in DataFacade.Get<IProduct>()
          select product.Id, product.Name, categoryById[product.Id].CategoryName).ToList();

 

This code has O(n*lg(n)) complexity, and for 10.000 products,  the order would be about (10.000 * ln(10.000)) ~=  9.0000 operations, will likely take less than 10ms to execute.

TIP 2: While using Xml data provider it’s better not to join more than 2 tables in the same query. If you have still have it, be sure that in the case there will be more than just hundreds records, so query will not become a bottleneck that may load the server completely.

 

Part 2. Sql data provider, caching.

Let’s take a look, how a query execution goes for Sql data provider

 

public IProcuct GetProductById(Guid productId) {
  IQueryable<IProduct> products = DataFacade.GetData<IProduct>();
  
  return (from product in products
          where product.Id == productId
          select product).FirstOrDefault();
}

 

Execution time:

1)      Compiling query  ~ 7ms

2)      SqlServer round-time-trip  ~ 5-10 ms (may be worse depending on where it is hosted)

3)      Parsing & execution ~ 2 ms.

And we have 15-20 ms for one query. So, if we’re up to achieve a perfect response time (20 ms) while ordinary requests, we cannot afford a single LINQ statement to be executed. Actually LINQ  isn’t that bad and we will return to this point a bit later.

If we compare that result (15 ms) with the one we have for XmlDataProvider in the previous post (<1ms), it may seem that using SQL data provider does not have sense at all. In order order to fix this disadvantage, in Composite C1 we have added “DataFacade” cache. The point is - if data interface is marked as “cachable”,  DataFace.GetData<…>() will return not a LINQ2SQL query, but an in-memory copy of the table (represented by System.Linq.EnumerableQuery<T> class). And, if this is the case, the same rules as for Xml data provider are applied.

The optimized code will be looking like:

 

public IProcuct GetProductById(Guid productId) {
    // Interface IProduct is marked as cacheble
    IEnumerable<IProduct> products = DataFacade.GetData<IProduct>();
  
  return (from product in products
               where product.Id == productId
                select product).FirstOrDefault();
}

 

As you can see, I’ve used the TIP 1 and used IEnumerable<> instead of IQueryable<> in order to avoid losing time on query compilation.

TIP 3. For those data types that don't have much data (> 10.000 rows) it has sense to check the "Has caching" checkbox in type editor, which will tell the system to keep the data rows in memory, and by doing so, removing the need to round-trips to a SQL sever.

As all basic queries are optimized, you may consider to cache results of LINQ queries, as this approach gives the best results.

Here's an example of usage of the QueryCache class, which allows to create "data item by property's value" cache.

 

    public class Products
    {
        private static readonly QueryCache<Product, Guid> _productById = new QueryCache<Product, Guid>("Product by id", p => p.Id, 10000);

        public static Product GetProductById(Guid productId)
        {
            return _productById[productId];
        }
    }

 

This way the actual data query will be executed only first time, other calls to _productById[productId] will lead to just a hash-table look-up performance wise. Once data is modified, QueryCache automatically clears the related cached rows.

TIP 4. For simple queries use QueryCache class, try to break up complex queries into simpler "cachable" parts.

 

That's basically it with LINQ and performance.

We also have build in page rendering profiler. Usage is quite simple, wrap performance critical C# code with the following "using" block:

using(Composite.Core.Instrumentation.Profiler.Measure("my code"))
{
  /* your code */
}

then add "?c1mode=perf" to a page url, and if you're loggin in to the C1 console, you'll see the profiling report for that page.