Monday, October 15, 2012

How to upload a file with fixed file Size using C# in asp.net

Uploading a PDF or Word File having Limited file Size.

Step 1 : Add File Upload tool from the toolbox.
Fig. 1 : To Add upload tool from toolbox
Step 2:  On Button click event

Calling the function on button click to validate the file uploaded is of the format mentioned and has size below the limit defined. Here the file Size is Maximum of  1 MB.

Following are the code validate it:

  protected void btnSubmit_Click(object sender, EventArgs e)
    {
        try
        {
            string file = "";
            if (FileUpload1.HasFile)
            {
                file = UploadFile();
            }

        }
        catch (Exception ex)
        {
            DisplayAJAXMessage(this, "Error::" + ex);
        }
    }

------------------------------------------------------------------------

private string UploadFile()
    {
        string file = lblGLId.Text + "_" + FileUpload1.FileName;

        if (FileUpload1.PostedFile.ContentLength < 1000000)
        {
            string Extension = Path.GetExtension(FileUpload1.PostedFile.FileName);

            if ((Convert.ToString(Extension).ToLower() == ".pdf") ||                         
                  (Convert.ToString(Extension).ToLower() == ".doc") ||

                   (Convert.ToString(Extension).ToLower() == ".docx"))
            {
                string path = Server.MapPath(".") + "\\FolderName\\" + file;
                FileUpload1.PostedFile.SaveAs(path);
            }
            else
            {
                DisplayAJAXMessage(this, "Only file with extension .PDF/.DOC/.DOCX Allowed to be Uploaded");
            }
        }
        else
        {
            DisplayAJAXMessage(this, "Maximum file size upto 1MB");
        }
        return file;
    }
---------------------------------------------------------------------------------
    static public void DisplayAJAXMessage(Control page, string msg)
    {
        string myScript = String.Format("alert('{0}');", msg);
        ScriptManager.RegisterStartupScript(page, page.GetType(), "MyScript", myScript, true);
    }


Note : The "FolderName" should be a folder where the uploaded file is to be kept in project.

View Amit Lal's profile on LinkedIn

Sunday, October 14, 2012

Display Report Using Report Viewer on Button Click using C# in asp.net

To Display the Report on Microsoft Report Viewer following are the steps:

Step 1 : Add the Microsoft Report Viewer from Reporting tab of the Tool Box.

Fig : To Add Report Viewer in the Design Page.

 Step 2: On "Show" Button Click display the report.

On button click we provide the Report Viewer parameters to display the appropriate report with respect to the data provided using the dropdownlist.

Fig 2: Report Display
In the above figure Fig 2, the user selects the details from the dropdownlist and is passed as parameter on "Show" button click.

Step 3: Code to be written on Show Button Click event.

 protected void btnShow_Click(object sender, EventArgs e)
    {
        try
        {
            ReportParameter[] parm = new ReportParameter[2];
            parm[0] = new ReportParameter("<BatchYear>", ddlBatch.SelectedValue);
            parm[1] = new ReportParameter("<ProgramCode>", ddlProgramName.SelectedValue);
            ReportViewer1.ShowCredentialPrompts = false;
            ReportViewer1.ShowParameterPrompts = false;
            ReportViewer1.ServerReport.ReportServerCredentials = new ReportCredentials(UserId, Password, DomainName);
            ReportViewer1.ProcessingMode = Microsoft.Reporting.WebForms.ProcessingMode.Remote;
            ReportViewer1.ServerReport.ReportServerUrl = new System.Uri(ReportServerUrl);
            ReportViewer1.ServerReport.ReportPath = "//<ReportPath>"; 
            ReportViewer1.ServerReport.SetParameters(parm);
            ReportViewer1.ServerReport.Refresh();
        }
        catch (Exception ex)
        {
            DisplayAJAXMessage(this,"Error:: "+ex);
        }
    }


static public void DisplayAJAXMessage(Control page, string msg)
    {
        string myScript = String.Format("alert('{0}');", msg);
        ScriptManager.RegisterStartupScript(page, page.GetType(), "MyScript", myScript, true);
    }


Note: The text shown in bold need to added as per requirement

View Amit Lal's profile on LinkedIn

Sunday, October 7, 2012

The SQL Server Query Optimizer - 2

Generating Candidate Execution Plans

The basic purpose of the Query Optimizer is to find an efficient execution plan for your query. Even for relatively simple queries, there may be a large number of different ways to access the data to produce the same end result. As such, the Query Optimizer has to select the best possible plan from what may be a very large number of candidate execution plans, and it’s important that it makes a wise choice, as the time it takes to return the results to the user can vary wildly, depending on which plan is selected.
The job of the Query Optimizer is to create and assess as many candidate execution plans as possible, within certain criteria, in order to arrive at the best possible plan. We define the search space for a given query as the set of all the possible execution plans for that query, and any possible plan in this search space returns the same results. Theoretically, in order to find the optimum execution plan for a query, a cost-based query optimizer should generate all possible execution plans that exist in that search space and correctly estimate the cost of each plan. However, some complex queries may have thousands or even millions of possible execution plans and, while the SQL Server Query Optimizer can typically consider a large number of candidate execution plans, it cannot perform an exhaustive search of all the possible plans for every query. If it did, then the time taken to assess all of the plans would be unacceptably long, and could start to have a major impact on the overall query execution time.
The Query Optimizer must strike a balance between optimization time and plan quality. For example, if the Query Optimizer spends one second finding a good enough plan that executes in one minute, then it doesn’t make sense to try to find the perfect or most optimal plan, if this is going to take five minutes of optimization time, plus the execution time. So SQL Server does not do an exhaustive search, but instead tries to find a suitably efficient plan as quickly as possible. As the Query Optimizer is working within a time constraint, there’s a chance that the plan selected may be the optimal plan but it is also likely that it may just be something close to the optimal plan.
In order to explore the search space, the Query Optimizer uses transformation rules and heuristics. The generation of candidate execution plans is performed inside the Query Optimizer using transformation rules, and the use of heuristics limits the number of choices considered in order to keep the optimization time reasonable. Candidate plans are stored in memory during the optimization, in a component called the Memo.

Assessing the Cost of each Plan

Searching, or enumerating candidate plans is just one part of the optimization process. The Query Optimizer still needs to estimate the cost of these plans and select the least expensive one. To estimate the cost of a plan, it estimates the cost of each physical operator in that plan using costing formulas that consider the use of resources such as I/O, CPU, and memory. This cost estimation depends mostly on the algorithm used by the physical operator, as well as the estimated number of records that will need to be processed; this estimate of the number of records is known as the cardinality estimation.
To help with this cardinality estimation, SQL Server uses and maintains optimizer statistics, which contain statistical information describing the distribution of values in one or more columns of a table. Once the cost for each operator is estimated using estimations of cardinality and resource demands, the Query Optimizer will add up all of these costs to estimate the cost for the entire plan.

Query Execution and Plan Caching

Once the query is optimized, the resulting plan is used by the Execution Engine to retrieve the desired data. The generated execution plan may be stored in memory, in the plan cache (known as the procedure cache in previous versions of SQL Server) in order that it might be reused if the same query is executed again. If a valid plan is available in the plan cache, then the optimization process can be skipped and the associated cost of this step, in terms of optimization time, CPU resources, and so on, can be avoided.
However, reuse of an existing plan may not always be the best solution for a given query. Depending on the distribution of data within a table, the optimal execution plan for a given query may differ greatly depending on the parameters provided in said query, and a behavior known as parameter sniffing may result in a suboptimal plan being chosen.
Even when an execution plan is available in the plan cache, some metadata changes, such as removing an index or a constraint, or significant enough changes made to the contents of the database, may render an existing plan invalid or suboptimal, and thus cause it to be discarded from the plan cache and a new optimization to be generated. As a trivial example, removing an index will make a plan invalid if the index is used by that plan. Likewise, the creation of a new index could make a plan suboptimal, if this index could be used to create a more efficient alternative plan, and enough changes to the database contents may trigger an automatic update of statistics, with the same effect on the existing plan.
Plans may also be removed from the plan cache when SQL Server is under memory pressure or when certain statements are executed. Changing some configuration options, for example max degree of parallelism, will clear the entire plan cache. Alternatively, some statements, like altering a database with certain ALTER DATABASE options will clear all the plans associated with that particular database.

Hinting

Most of the time, the Query Optimizer does a great job at choosing highly efficient execution plans. However, there may be cases when the selected execution plan does not perform as expected. It is vitally important to differentiate between when these cases arise because you are not providing the Query Optimizer with all the information it needs to do a good job, and when the problem arises because of a Query Optimizer limitation.
The reality is that, even after more than 30 years of research, query optimizers are highly complex pieces of software which still face some technical challenges, some of which will be mentioned in the next section. As a result, there may be cases when, even after you’ve provided the Query Optimizer with all the information it needs and there doesn’t seem to be any apparent problem, you are still not getting an efficient plan; in these cases you may want to resort to hints. However, since hints let you to override the operations of the Query Optimizer, they need to be used with caution, and only as a last resort when no other option is available. Hints are instructions that you can send to the Query Optimizer to influence a particular area of an execution plan. For example, you can use hints to direct the Query Optimizer to use a particular index or a specific join algorithm. You can even ask the Query Optimizer to use a specific execution plan, provided that you specify one in XML format.

Ongoing Query Optimizer Challenges

Query optimization is an inherently complex problem, not only in SQL Server but in any other relational database system. Despite the fact that query optimization research dates back to the early seventies, challenges in some fundamental areas are still being addressed today. The first major impediment to a query optimizer finding an optimal plan is the fact that, for many queries, it is just not possible to explore the entire search space. An effect known as combinatorial explosion makes this exhaustive enumeration impossible, as the number of possible plans grows very rapidly depending on the number of tables joined in the query. To make the search a manageable process, heuristics are used to limit the search space. However, if a query optimizer is not able to explore the entire search space, there is no way to prove that you can get an absolutely optimal plan, or even that the best plan is among the candidate being considered. As a result, it is clearly extremely important that the set of plans which a query optimizer considers contains plans with low costs.
This leads us to another major technical challenge for the Query Optimizer: accurate cost and cardinality estimation. Since a cost-based optimizer selects the execution plan with the lowest cost, the quality of the plan selection is only as good as the accuracy of the optimizer’s cost and cardinality estimations. Even supposing that time is not a concern and that the query optimizer can analyze the entire search space without a problem, cardinality and cost estimation errors can still make a a query optimizer select the wrong plan. Cost estimation models are inherently inexact, as they do not consider all of the hardware conditions, and must necessarily make certain assumptions about the environment. For example, the costing model assumes that every query starts with a cold cache; that is, its data is read from disk and not from memory, and this assumption could lead to costing estimation errors in some cases. In addition, cost estimation relies on cardinality estimation, which is also inexact and has some known limitations, especially when it comes to the estimation of the intermediate results in a plan. On top of all that, there are some operations which are not covered by the mathematical model of the cardinality estimation component, which has to resort to guess logic or heuristics to deal with these situations.

A Historical Perspective

We’ve seen some of the challenges query optimizers still face today, but these imperfections are not for want of time or research. One of these earliest works describing a cost-based query optimizer was “Access Path Selection in a Relational Database Management System”, published in 1979 by Pat Selinger et al to describe the query optimizer for an experimental database management system developed in 1975 at what is now the IBM Almaden Research Center. This "System-R" management system advanced the field of Query Optimization by introducing the use of cost-based query optimization, the use of statistics, an efficient method of determining join orders, and the addition of CPU cost to the optimizer's cost estimation formulae.
Yet despite being an enormous influence in the field of query optimization research, it suffered a major drawback: its framework could not be easily extended to include additional transformations. This led to the development of more extensible optimization architectures, which facilitated the gradual addition of new functionality to query optimizers. The trailblazers in this field were the Exodus Extensible DBMS Project, and later the Volcano Optimizer generator, the latter of which was defined by Goetz Graefe (who was also involved in the Exodus Project) and William McKenna. Goetz Graefe then went on to define the Cascades Framework, resolving errors which were present in his previous two endeavors.
While this is interesting, what's most relevant for you and me is that SQL Server implemented its own cost-based Query Optimizer based on the Cascades Framework in 1999, when its database engine was re-architected for the release of SQL Server 7.0. The extensible architecture of the Cascades Framework has made it much easier for new functionality, such as new transformation rules or physical operators, to be implemented in the Query Optimizer. This allows the performance of the Query Optimizer to constant be tuned and improved.

View Amit Lal's profile on LinkedIn

The SQL Server Query Optimizer - 1


The SQL Server Query Optimizer is a Cost-Based Optimizer. It analyzes a number of candidate execution plans for a given query, estimates the cost of each of these plans and selects the plan with the lowest cost of the choices considered. Indeed, given that the Query Optimizer cannot consider every possible plan for every query, it actually has to do a cost-based balancing act, considering both the cost of finding potential plans and the costs of plans themselves.
Therefore, it is the SQL Server component that has the biggest impact on the performance of your databases. After all, selecting the right (or wrong) execution plan could mean the difference between a query execution time of milliseconds, and one of minutes or even hours. Naturally, a better understanding of how the Query Optimizer works can help both database administrators and developers to write better queries and to provide the Query Optimizer with the information it needs to produce efficient execution plans.

 

How the Query Optimizer Works

At the core of the SQL Server Database Engine are two major components: the Storage Engine and the Query Processor, also called the Relational Engine. The Storage Engine is responsible for reading data between the disk and memory in a manner that optimizes concurrency while maintaining data integrity. The Query Processor, as the name suggests, accepts all queries submitted to SQL Server, devises a plan for their optimal execution, and then executes the plan and delivers the required results.
Queries are submitted to SQL Server using the SQL language (or T-SQL, the Microsoft SQL Server extension to SQL). Since SQL is a high-level declarative language, it only defines what data to get from the database, not the steps required to retrieve that data, or any of the algorithms for processing the request. Thus, for each query it receives, the first job of the query processor is to devise a plan, as quickly as possible, which describes the best possible way to execute said query (or, at the very least, an efficient way). Its second job is to execute the query according to that plan.
Each of these tasks is delegated to a separate component within the query processor; the Query Optimizer devises the plan and then passes it along to the Execution Engine, which will actually execute the plan and get the results from the database.
In order to arrive at what it believes to be the best plan for executing a query, the Query Processor performs a number of different steps; the entire query processing process is shown on figure 1-1.

The Query Processing Process
Figure 1 - The Query Processing Process
  • Parsing and binding – the query is parsed and bound. Assuming the query is valid, the output of this phase is a logical tree, with each node in the tree representing a logical operation that the query must perform, such as reading a particular table, or performing an inner join. This logical tree is then used to run the query optimization process, which roughly consists of the following two steps;
  • Generate possible execution plans – using the logical tree, the Query Optimizer devises a number of possible ways to execute the query i.e. a number of possible execution plans. An execution plan is, in essence, a set of physical operations (an index seek, a nested loop join, and so on), that can be performed to produce the required result, as described by the logical tree;
  • Cost-assessment of each plan – While the Query Optimizer does not generate every possible execution plan, it assesses the resource and time cost of each plan it does generate. The plan that the Query Optimizer deems to have the lowest cost of those it’s assessed is selected, and passed along to the Execution Engine;
  • Query execution, plan caching – the query is executed by the Execution Engine, according to the selected plan. The plan may be stored in memory, in the plan cache.
Parsing and binding are the first operations performed when a query is submitted to a SQL Server instance. Parsing makes sure that the T-SQL query has a valid syntax, and translates the SQL query into an initial tree representation: specifically, a tree of logical operators representing the high-level steps required to execute the query in question. Initially, these logical operators will be closely related to the original syntax of the query, and will include such logical operations as “get data from the Customer table”, “get data from the Contact table”, “perform an inner join”, and so on. Different tree representations of the query will be used throughout the optimization process, and this logical tree will receive different names until it is finally used to initialize the Memo structure, as will be discussed later.
Binding is mostly concerned with name resolution. During the binding operation, SQL Server makes sure that all the object names do exist, and associates every table and column name on the parse tree with their corresponding object in the system catalog. The output of this second process is called an algebrized tree, which is then sent to the Query Optimizer.
The next step is the optimization process, which is basically the generation of candidate execution plans and the selection of the best of these plans according to their cost. As has already been mentioned, SQL Server uses a cost-based optimizer, and uses a cost estimation model to estimate the cost of each of the candidate plans.
In essence, query optimization is the process of mapping the logical query operations expressed in the tree representation to physical operations, which can be carried out by the execution engine. So it's actually the functionality of the execution engine that is being implemented in the execution plans being created by the Query Optimizer, that is, the execution engine implements a certain number of different algorithms and it is from these algorithms that the Query Optimizer must choose, when formulating its execution plans. It does this by translating the original logical operations into the physical operations that the execution engine is capable of performing, and execution plans show both the logical and physical operations. Some logical operations, such as a Sort, translate to the same physical operation, whereas other logical operations map to several possible physical operations. For example, a logical join can be mapped to a Nested Loops Join, Merge Join, or Hash Join physical operator.
Thus, the end product of the query optimization process is an execution plan: a tree consisting of a number of physical operators, which contain the algorithms to be performed by the execution engine in order to obtain the desired results from the database.
Next part will explain the following Concepts: 
  • Generating Candidate Execution Plans. 
  • Assessing the Cost of each Plan. 
  • Query Execution and Plan Caching. 
  • Hinting.
  • Ongoing Query Optimizer Challenges. 
  • A Historical Perspective.
View Amit Lal's profile on LinkedIn

Monday, October 15, 2012

How to upload a file with fixed file Size using C# in asp.net

Uploading a PDF or Word File having Limited file Size.

Step 1 : Add File Upload tool from the toolbox.
Fig. 1 : To Add upload tool from toolbox
Step 2:  On Button click event

Calling the function on button click to validate the file uploaded is of the format mentioned and has size below the limit defined. Here the file Size is Maximum of  1 MB.

Following are the code validate it:

  protected void btnSubmit_Click(object sender, EventArgs e)
    {
        try
        {
            string file = "";
            if (FileUpload1.HasFile)
            {
                file = UploadFile();
            }

        }
        catch (Exception ex)
        {
            DisplayAJAXMessage(this, "Error::" + ex);
        }
    }

------------------------------------------------------------------------

private string UploadFile()
    {
        string file = lblGLId.Text + "_" + FileUpload1.FileName;

        if (FileUpload1.PostedFile.ContentLength < 1000000)
        {
            string Extension = Path.GetExtension(FileUpload1.PostedFile.FileName);

            if ((Convert.ToString(Extension).ToLower() == ".pdf") ||                         
                  (Convert.ToString(Extension).ToLower() == ".doc") ||

                   (Convert.ToString(Extension).ToLower() == ".docx"))
            {
                string path = Server.MapPath(".") + "\\FolderName\\" + file;
                FileUpload1.PostedFile.SaveAs(path);
            }
            else
            {
                DisplayAJAXMessage(this, "Only file with extension .PDF/.DOC/.DOCX Allowed to be Uploaded");
            }
        }
        else
        {
            DisplayAJAXMessage(this, "Maximum file size upto 1MB");
        }
        return file;
    }
---------------------------------------------------------------------------------
    static public void DisplayAJAXMessage(Control page, string msg)
    {
        string myScript = String.Format("alert('{0}');", msg);
        ScriptManager.RegisterStartupScript(page, page.GetType(), "MyScript", myScript, true);
    }


Note : The "FolderName" should be a folder where the uploaded file is to be kept in project.

View Amit Lal's profile on LinkedIn

Sunday, October 14, 2012

Display Report Using Report Viewer on Button Click using C# in asp.net

To Display the Report on Microsoft Report Viewer following are the steps:

Step 1 : Add the Microsoft Report Viewer from Reporting tab of the Tool Box.

Fig : To Add Report Viewer in the Design Page.

 Step 2: On "Show" Button Click display the report.

On button click we provide the Report Viewer parameters to display the appropriate report with respect to the data provided using the dropdownlist.

Fig 2: Report Display
In the above figure Fig 2, the user selects the details from the dropdownlist and is passed as parameter on "Show" button click.

Step 3: Code to be written on Show Button Click event.

 protected void btnShow_Click(object sender, EventArgs e)
    {
        try
        {
            ReportParameter[] parm = new ReportParameter[2];
            parm[0] = new ReportParameter("<BatchYear>", ddlBatch.SelectedValue);
            parm[1] = new ReportParameter("<ProgramCode>", ddlProgramName.SelectedValue);
            ReportViewer1.ShowCredentialPrompts = false;
            ReportViewer1.ShowParameterPrompts = false;
            ReportViewer1.ServerReport.ReportServerCredentials = new ReportCredentials(UserId, Password, DomainName);
            ReportViewer1.ProcessingMode = Microsoft.Reporting.WebForms.ProcessingMode.Remote;
            ReportViewer1.ServerReport.ReportServerUrl = new System.Uri(ReportServerUrl);
            ReportViewer1.ServerReport.ReportPath = "//<ReportPath>"; 
            ReportViewer1.ServerReport.SetParameters(parm);
            ReportViewer1.ServerReport.Refresh();
        }
        catch (Exception ex)
        {
            DisplayAJAXMessage(this,"Error:: "+ex);
        }
    }


static public void DisplayAJAXMessage(Control page, string msg)
    {
        string myScript = String.Format("alert('{0}');", msg);
        ScriptManager.RegisterStartupScript(page, page.GetType(), "MyScript", myScript, true);
    }


Note: The text shown in bold need to added as per requirement

View Amit Lal's profile on LinkedIn

Sunday, October 7, 2012

The SQL Server Query Optimizer - 2

Generating Candidate Execution Plans

The basic purpose of the Query Optimizer is to find an efficient execution plan for your query. Even for relatively simple queries, there may be a large number of different ways to access the data to produce the same end result. As such, the Query Optimizer has to select the best possible plan from what may be a very large number of candidate execution plans, and it’s important that it makes a wise choice, as the time it takes to return the results to the user can vary wildly, depending on which plan is selected.
The job of the Query Optimizer is to create and assess as many candidate execution plans as possible, within certain criteria, in order to arrive at the best possible plan. We define the search space for a given query as the set of all the possible execution plans for that query, and any possible plan in this search space returns the same results. Theoretically, in order to find the optimum execution plan for a query, a cost-based query optimizer should generate all possible execution plans that exist in that search space and correctly estimate the cost of each plan. However, some complex queries may have thousands or even millions of possible execution plans and, while the SQL Server Query Optimizer can typically consider a large number of candidate execution plans, it cannot perform an exhaustive search of all the possible plans for every query. If it did, then the time taken to assess all of the plans would be unacceptably long, and could start to have a major impact on the overall query execution time.
The Query Optimizer must strike a balance between optimization time and plan quality. For example, if the Query Optimizer spends one second finding a good enough plan that executes in one minute, then it doesn’t make sense to try to find the perfect or most optimal plan, if this is going to take five minutes of optimization time, plus the execution time. So SQL Server does not do an exhaustive search, but instead tries to find a suitably efficient plan as quickly as possible. As the Query Optimizer is working within a time constraint, there’s a chance that the plan selected may be the optimal plan but it is also likely that it may just be something close to the optimal plan.
In order to explore the search space, the Query Optimizer uses transformation rules and heuristics. The generation of candidate execution plans is performed inside the Query Optimizer using transformation rules, and the use of heuristics limits the number of choices considered in order to keep the optimization time reasonable. Candidate plans are stored in memory during the optimization, in a component called the Memo.

Assessing the Cost of each Plan

Searching, or enumerating candidate plans is just one part of the optimization process. The Query Optimizer still needs to estimate the cost of these plans and select the least expensive one. To estimate the cost of a plan, it estimates the cost of each physical operator in that plan using costing formulas that consider the use of resources such as I/O, CPU, and memory. This cost estimation depends mostly on the algorithm used by the physical operator, as well as the estimated number of records that will need to be processed; this estimate of the number of records is known as the cardinality estimation.
To help with this cardinality estimation, SQL Server uses and maintains optimizer statistics, which contain statistical information describing the distribution of values in one or more columns of a table. Once the cost for each operator is estimated using estimations of cardinality and resource demands, the Query Optimizer will add up all of these costs to estimate the cost for the entire plan.

Query Execution and Plan Caching

Once the query is optimized, the resulting plan is used by the Execution Engine to retrieve the desired data. The generated execution plan may be stored in memory, in the plan cache (known as the procedure cache in previous versions of SQL Server) in order that it might be reused if the same query is executed again. If a valid plan is available in the plan cache, then the optimization process can be skipped and the associated cost of this step, in terms of optimization time, CPU resources, and so on, can be avoided.
However, reuse of an existing plan may not always be the best solution for a given query. Depending on the distribution of data within a table, the optimal execution plan for a given query may differ greatly depending on the parameters provided in said query, and a behavior known as parameter sniffing may result in a suboptimal plan being chosen.
Even when an execution plan is available in the plan cache, some metadata changes, such as removing an index or a constraint, or significant enough changes made to the contents of the database, may render an existing plan invalid or suboptimal, and thus cause it to be discarded from the plan cache and a new optimization to be generated. As a trivial example, removing an index will make a plan invalid if the index is used by that plan. Likewise, the creation of a new index could make a plan suboptimal, if this index could be used to create a more efficient alternative plan, and enough changes to the database contents may trigger an automatic update of statistics, with the same effect on the existing plan.
Plans may also be removed from the plan cache when SQL Server is under memory pressure or when certain statements are executed. Changing some configuration options, for example max degree of parallelism, will clear the entire plan cache. Alternatively, some statements, like altering a database with certain ALTER DATABASE options will clear all the plans associated with that particular database.

Hinting

Most of the time, the Query Optimizer does a great job at choosing highly efficient execution plans. However, there may be cases when the selected execution plan does not perform as expected. It is vitally important to differentiate between when these cases arise because you are not providing the Query Optimizer with all the information it needs to do a good job, and when the problem arises because of a Query Optimizer limitation.
The reality is that, even after more than 30 years of research, query optimizers are highly complex pieces of software which still face some technical challenges, some of which will be mentioned in the next section. As a result, there may be cases when, even after you’ve provided the Query Optimizer with all the information it needs and there doesn’t seem to be any apparent problem, you are still not getting an efficient plan; in these cases you may want to resort to hints. However, since hints let you to override the operations of the Query Optimizer, they need to be used with caution, and only as a last resort when no other option is available. Hints are instructions that you can send to the Query Optimizer to influence a particular area of an execution plan. For example, you can use hints to direct the Query Optimizer to use a particular index or a specific join algorithm. You can even ask the Query Optimizer to use a specific execution plan, provided that you specify one in XML format.

Ongoing Query Optimizer Challenges

Query optimization is an inherently complex problem, not only in SQL Server but in any other relational database system. Despite the fact that query optimization research dates back to the early seventies, challenges in some fundamental areas are still being addressed today. The first major impediment to a query optimizer finding an optimal plan is the fact that, for many queries, it is just not possible to explore the entire search space. An effect known as combinatorial explosion makes this exhaustive enumeration impossible, as the number of possible plans grows very rapidly depending on the number of tables joined in the query. To make the search a manageable process, heuristics are used to limit the search space. However, if a query optimizer is not able to explore the entire search space, there is no way to prove that you can get an absolutely optimal plan, or even that the best plan is among the candidate being considered. As a result, it is clearly extremely important that the set of plans which a query optimizer considers contains plans with low costs.
This leads us to another major technical challenge for the Query Optimizer: accurate cost and cardinality estimation. Since a cost-based optimizer selects the execution plan with the lowest cost, the quality of the plan selection is only as good as the accuracy of the optimizer’s cost and cardinality estimations. Even supposing that time is not a concern and that the query optimizer can analyze the entire search space without a problem, cardinality and cost estimation errors can still make a a query optimizer select the wrong plan. Cost estimation models are inherently inexact, as they do not consider all of the hardware conditions, and must necessarily make certain assumptions about the environment. For example, the costing model assumes that every query starts with a cold cache; that is, its data is read from disk and not from memory, and this assumption could lead to costing estimation errors in some cases. In addition, cost estimation relies on cardinality estimation, which is also inexact and has some known limitations, especially when it comes to the estimation of the intermediate results in a plan. On top of all that, there are some operations which are not covered by the mathematical model of the cardinality estimation component, which has to resort to guess logic or heuristics to deal with these situations.

A Historical Perspective

We’ve seen some of the challenges query optimizers still face today, but these imperfections are not for want of time or research. One of these earliest works describing a cost-based query optimizer was “Access Path Selection in a Relational Database Management System”, published in 1979 by Pat Selinger et al to describe the query optimizer for an experimental database management system developed in 1975 at what is now the IBM Almaden Research Center. This "System-R" management system advanced the field of Query Optimization by introducing the use of cost-based query optimization, the use of statistics, an efficient method of determining join orders, and the addition of CPU cost to the optimizer's cost estimation formulae.
Yet despite being an enormous influence in the field of query optimization research, it suffered a major drawback: its framework could not be easily extended to include additional transformations. This led to the development of more extensible optimization architectures, which facilitated the gradual addition of new functionality to query optimizers. The trailblazers in this field were the Exodus Extensible DBMS Project, and later the Volcano Optimizer generator, the latter of which was defined by Goetz Graefe (who was also involved in the Exodus Project) and William McKenna. Goetz Graefe then went on to define the Cascades Framework, resolving errors which were present in his previous two endeavors.
While this is interesting, what's most relevant for you and me is that SQL Server implemented its own cost-based Query Optimizer based on the Cascades Framework in 1999, when its database engine was re-architected for the release of SQL Server 7.0. The extensible architecture of the Cascades Framework has made it much easier for new functionality, such as new transformation rules or physical operators, to be implemented in the Query Optimizer. This allows the performance of the Query Optimizer to constant be tuned and improved.

View Amit Lal's profile on LinkedIn

The SQL Server Query Optimizer - 1


The SQL Server Query Optimizer is a Cost-Based Optimizer. It analyzes a number of candidate execution plans for a given query, estimates the cost of each of these plans and selects the plan with the lowest cost of the choices considered. Indeed, given that the Query Optimizer cannot consider every possible plan for every query, it actually has to do a cost-based balancing act, considering both the cost of finding potential plans and the costs of plans themselves.
Therefore, it is the SQL Server component that has the biggest impact on the performance of your databases. After all, selecting the right (or wrong) execution plan could mean the difference between a query execution time of milliseconds, and one of minutes or even hours. Naturally, a better understanding of how the Query Optimizer works can help both database administrators and developers to write better queries and to provide the Query Optimizer with the information it needs to produce efficient execution plans.

 

How the Query Optimizer Works

At the core of the SQL Server Database Engine are two major components: the Storage Engine and the Query Processor, also called the Relational Engine. The Storage Engine is responsible for reading data between the disk and memory in a manner that optimizes concurrency while maintaining data integrity. The Query Processor, as the name suggests, accepts all queries submitted to SQL Server, devises a plan for their optimal execution, and then executes the plan and delivers the required results.
Queries are submitted to SQL Server using the SQL language (or T-SQL, the Microsoft SQL Server extension to SQL). Since SQL is a high-level declarative language, it only defines what data to get from the database, not the steps required to retrieve that data, or any of the algorithms for processing the request. Thus, for each query it receives, the first job of the query processor is to devise a plan, as quickly as possible, which describes the best possible way to execute said query (or, at the very least, an efficient way). Its second job is to execute the query according to that plan.
Each of these tasks is delegated to a separate component within the query processor; the Query Optimizer devises the plan and then passes it along to the Execution Engine, which will actually execute the plan and get the results from the database.
In order to arrive at what it believes to be the best plan for executing a query, the Query Processor performs a number of different steps; the entire query processing process is shown on figure 1-1.

The Query Processing Process
Figure 1 - The Query Processing Process
  • Parsing and binding – the query is parsed and bound. Assuming the query is valid, the output of this phase is a logical tree, with each node in the tree representing a logical operation that the query must perform, such as reading a particular table, or performing an inner join. This logical tree is then used to run the query optimization process, which roughly consists of the following two steps;
  • Generate possible execution plans – using the logical tree, the Query Optimizer devises a number of possible ways to execute the query i.e. a number of possible execution plans. An execution plan is, in essence, a set of physical operations (an index seek, a nested loop join, and so on), that can be performed to produce the required result, as described by the logical tree;
  • Cost-assessment of each plan – While the Query Optimizer does not generate every possible execution plan, it assesses the resource and time cost of each plan it does generate. The plan that the Query Optimizer deems to have the lowest cost of those it’s assessed is selected, and passed along to the Execution Engine;
  • Query execution, plan caching – the query is executed by the Execution Engine, according to the selected plan. The plan may be stored in memory, in the plan cache.
Parsing and binding are the first operations performed when a query is submitted to a SQL Server instance. Parsing makes sure that the T-SQL query has a valid syntax, and translates the SQL query into an initial tree representation: specifically, a tree of logical operators representing the high-level steps required to execute the query in question. Initially, these logical operators will be closely related to the original syntax of the query, and will include such logical operations as “get data from the Customer table”, “get data from the Contact table”, “perform an inner join”, and so on. Different tree representations of the query will be used throughout the optimization process, and this logical tree will receive different names until it is finally used to initialize the Memo structure, as will be discussed later.
Binding is mostly concerned with name resolution. During the binding operation, SQL Server makes sure that all the object names do exist, and associates every table and column name on the parse tree with their corresponding object in the system catalog. The output of this second process is called an algebrized tree, which is then sent to the Query Optimizer.
The next step is the optimization process, which is basically the generation of candidate execution plans and the selection of the best of these plans according to their cost. As has already been mentioned, SQL Server uses a cost-based optimizer, and uses a cost estimation model to estimate the cost of each of the candidate plans.
In essence, query optimization is the process of mapping the logical query operations expressed in the tree representation to physical operations, which can be carried out by the execution engine. So it's actually the functionality of the execution engine that is being implemented in the execution plans being created by the Query Optimizer, that is, the execution engine implements a certain number of different algorithms and it is from these algorithms that the Query Optimizer must choose, when formulating its execution plans. It does this by translating the original logical operations into the physical operations that the execution engine is capable of performing, and execution plans show both the logical and physical operations. Some logical operations, such as a Sort, translate to the same physical operation, whereas other logical operations map to several possible physical operations. For example, a logical join can be mapped to a Nested Loops Join, Merge Join, or Hash Join physical operator.
Thus, the end product of the query optimization process is an execution plan: a tree consisting of a number of physical operators, which contain the algorithms to be performed by the execution engine in order to obtain the desired results from the database.
Next part will explain the following Concepts: 
  • Generating Candidate Execution Plans. 
  • Assessing the Cost of each Plan. 
  • Query Execution and Plan Caching. 
  • Hinting.
  • Ongoing Query Optimizer Challenges. 
  • A Historical Perspective.
View Amit Lal's profile on LinkedIn