Saturday, October 31, 2015

Lesson 2: Adding Looping

 

Applies To: SQL Server 2016 Preview
In Lesson 1: Creating the Project and Basic Package, you created a package that extracted data from a single flat file source, transformed the data using Lookup transformations, and finally loaded the data into the FactCurrency fact table of the AdventureWorksDW2012 sample database.
However, it is rare for an extract, transform, and load (ETL) process to use a single flat file. A typical ETL process would extract data from multiple flat file sources. Extracting data from multiple sources requires an iterative control flow. One of the most anticipated features of Microsoft Integration Services is the ability to easily add iteration or looping to packages.
Integration Services provides two types of containers for looping through packages: the Foreach Loop container and the For Loop container. The Foreach Loop container uses an enumerator to perform the looping, whereas the For Loop container typically uses a variable expression. This lesson uses the Foreach Loop container.
The Foreach Loop container enables a package to repeat the control flow for each member of a specified enumerator. With the Foreach Loop container, you can enumerate:
  • ADO recordset rows
  • ADO .Net schema information
  • File and directory structures
  • System, package and user variables
  • Enumerable objects contained in a variable
  • Items in a collection
  • Nodes in an XML Path Language (XPath) expression
  • SQL Server Management Objects (SMO)
In this lesson, you will modify the simple ETL package created in Lesson 1 to take advantage of the Foreach Loop container. You will also set user-defined package variables to enable the tutorial package to iterate through all the flat files in the folder. If you have not completed the previous lesson, you can also copy the completed Lesson 1 package that is included with the tutorial.
In this lesson, you will not modify the data flow, only the control flow.
System_CAPS_importantImportant
This tutorial requires the AdventureWorksDW2012 sample database. For more information about how to install and deploy AdventureWorksDW2012, see Reporting Services Product Samples on CodePlex.

This lesson contains the following tasks:
  • Step 1: Copying the Lesson 1 Package
  • Step 2: Adding and Configuring the Foreach Loop Container
  • Step 3: Modifying the Flat File Connection Manager
  • Step 4: Testing the Lesson 2 Tutorial Package



    Step 1: Copying the Lesson 1 Package

    Applies To: SQL Server 2016 Preview
    In this task, you will create a copy of the Lesson 1.dtsx package that you created in Lesson 1. If you did not complete Lesson 1, you can add the completed lesson 1 package that is included with the tutorial to the project, and then copy it instead. You will use this new copy throughout the rest of Lesson 2.

    To create the Lesson 2 package

    1. If SQL Server Data Tools is not already open, click Start, point to All Programs, point to Microsoft SQL Server 2012, and then click SQL Server Data Tools.
    2. On the File menu, click Open, click Project/Solution, click the SSIS Tutorial folder and click Open, and then double-click SSIS Tutorial.sln.
    3. In Solution Explorer, right-click Lesson 1.dtsx, and then click Copy.
    4. In Solution Explorer, right-click SSIS Packages, and then click Paste.
      By default, the copied package will be named Lesson 2.dtsx.
    5. In Solution Explorer, double-click Lesson 2.dtsx to open the package
    6. Right-click anywhere in the background of the Control Flow design surface and click Properties.
    7. In the Properties window, update the Name property to Lesson 2.
    8. Click the box for the ID property, click the dropdown arrow and then click <Generate New ID>.

    To add the completed Lesson 1 package

    1. Open SQL Server Data Tools and open the SSIS Tutorial project.
    2. In Solution Explorer, right-click SSIS Packages, and click Add Existing Package.
    3. In the Add Copy of Existing Package dialog box, in Package location, select File system.
    4. Click the browse (…) button, navigate to Lesson 1.dtsx on your machine, and then click Open.
      To download all of the lesson packages for this tutorial, do the following.
      1. Navigate to Integration Services Product Samples
      2. Click the DOWNLOADS tab.
      3. Click the SQL2012.Integration_Services.Create_Simple_ETL_Tutorial.Sample.zip file.
    5. Copy and paste the Lesson 1 package as described in steps 3-8 in the previous procedure.

    Step 2: Adding and Configuring the Foreach Loop Container

    Applies To: SQL Server 2016 Preview
    In this task, you will add the ability to loop through a folder of flat files and apply the same data flow transformation used in Lesson 1 to each of those flat files. You do this by adding and configuring a Foreach Loop container to the control flow.
    The Foreach Loop container that you add must be able to connect to each flat file in the folder. Because all the files in the folder have the same format, the Foreach Loop container can use the same Flat File connection manager to connect to each of these files. The Flat File connection manager that the container will use is the same Flat File connection manager that you created in Lesson 1.
    Currently, the Flat File connection manager from Lesson 1 connects to only one, specific flat file. To iteratively connect to each flat file in the folder, you will have to configure both the Foreach Loop container and the Flat File connection manager as follows:
    • Foreach Loop container:   You will map the enumerated value of the container to a user-defined package variable. The container will then use this user-defined variable to dynamically modify the ConnectionString property of the Flat File connection manager and iteratively connect to each flat file in the folder.
    • Flat File connection manager:   You will modify the connection manager that was created in Lesson 1 by using a user-defined variable to populate the connection manager's ConnectionString property.
    The procedures in this task show you how to create and modify the Foreach Loop container to use a user-defined package variable and to add the data flow task to the loop. You will learn how to modify the Flat File connection manager to use a user-defined variable in the next task.
    After you have made these modifications to the package, when the package is run, the Foreach Loop Container will iterate through the collection of files in the Sample Data folder. Each time a file is found that matches the criteria, the Foreach Loop Container will populate the user-defined variable with the file name, map the user-defined variable to the ConnectionString property of the Sample Currency Data Flat File connection manager, and then run the data flow against that file. Therefore, in each iteration of the Foreach Loop the Data Flow task will consume a different flat file.
    System_CAPS_noteNote
    Because Microsoft Integration Services separates control flow from data flow, any looping that you add to the control flow will not require modification to the data flow. Therefore, the data flow that you created in Lesson 1 does not have to be changed.

    To add a Foreach Loop container

    1. In SQL Server Data Tools, click the Control Flow tab.
    2. In the SSIS Toolbox, expand Containers, and then drag a Foreach Loop Container onto the design surface of the Control Flow tab.
    3. Right-click the newly added Foreach Loop Container and select Edit.
    4. In the Foreach Loop Editor dialog box, on the General page, for Name, enter Foreach File in Folder. Click OK.
    5. Right-click the Foreach Loop container, click Properties, and in the Properties window, verify that the LocaleID property is set to English (United States).

    To configure the enumerator for the Foreach Loop container

    1. Double-click Foreach File in Folder to reopen the Foreach Loop Editor.
    2. Click Collection.
    3. On the Collection page, select Foreach File Enumerator.
    4. In the Enumerator configuration group, click Browse.
    5. In the Browse for Folder dialog box, locate the folder on your machine that contains the Currency_*.txt files.
      This sample data is included with the SSIS lesson packages. To download the sample data and the lesson packages, do the following.
      1. Navigate to Integration Services Product Samples
      2. Click the DOWNLOADS tab.
      3. Click the HYPERLINK "http://msftisprodsamples.codeplex.com/downloads/get/578097" SQL2012.Integration_Services.Create_Simple_ETL_Tutorial.Sample.zip file.
    6. In the Files box, type Currency_*.txt.

    To map the enumerator to a user-defined variable

    1. Click Variable Mappings.
    2. On the Variable Mappings page, in the Variable column, click the empty cell and select <New Variable…>.
    3. In the Add Variable dialog box, for Name, type varFileName.
      System_CAPS_importantImportant
      Variable names are case sensitive.
    4. Click OK.
    5. Click OK again to exit the Foreach Loop Editor dialog box.

    To add the data flow task to the loop

    • Drag the Extract Sample Currency Data data flow task onto the Foreach Loop container now renamed Foreach File in Folder.



      Step 3: Modifying the Flat File Connection Manager

       

      Applies To: SQL Server 2016 Preview
      In this task, you will modify the Flat File connection manager that you created and configured in Lesson 1. When originally created, the Flat File connection manager was configured to statically load a single file. To enable the Flat File connection manager to iteratively load files, you must modify the ConnectionString property of the connection manager to accept the user-defined variable User:varFileName, which contains the path of the file to be loaded at run time.
      By modifying the connection manager to use the value of the user-defined variable, User::varFileName, to populate the ConnectionString property of the connection manager, the connection manager will be able to connect to different flat files. At run time, each iteration of the Foreach Loop container will dynamically update the User::varFileName variable. Updating the variable, in turn, causes the connection manager to connect to a different flat file, and the data flow task to process a different set of data.

      To configure the Flat File connection manager to use a variable for the connection string

      1. In the Connection Managers pane, right-click Sample Flat File Source Data, and select Properties.
      2. In the Properties window, for Expressions, click in the empty cell, and then click the ellipsis button (…).
      3. In the Property Expressions Editor dialog box, in the Property column, type or select ConnectionString.
      4. In the Expression column, click the ellipsis button (…) to open the Expression Builder dialog box.
      5. In the Expression Builder dialog box, expand the Variables node.
      6. Drag the variable, User::varFileName, into the Expression box.
      7. Click OK to close the Expression Builder dialog box.
      8. Click OK again to close the Property Expressions Editor dialog box.

       

      Step 4: Testing the Lesson 2 Tutorial Package

       

      Applies To: SQL Server 2016 Preview
      With the Foreach Loop container and the Flat File connection manager now configured, the Lesson 2 package can iterate through the collection of 14 flat files in the Sample Data folder. Each time a file name is found that matches the specified file name criteria, the Foreach Loop container populates the user-defined variable with the file name. This variable, in turn, updates the ConnectionString property of the Flat File connection manager, and a connection to the new flat file is made. The Foreach Loop container then runs the unmodified data flow task against the data in the new flat file before connecting to the next file in the folder.
      Use the following procedure to test the new looping functionality that you have added to your package.
      System_CAPS_noteNote
      If you ran the package from Lesson 1, you will need to delete the records from dbo.FactCurrency in AdventureWorksDW2012 before you run the package from this lesson or the package will fail with errors indicating a Violation of Primary Key constraint. You will receive the same errors if you run the package by selecting Debug/Start Debugging (or press F5) because both Lesson 1 and Lesson 2 will run. Lesson 2 will attempt to insert records already inserted in Lesson 1.

      Before you test the package you should verify that the control and data flows in the Lesson 2 package contains the objects shown in the following diagrams. The data flow should be identical to the data flow in lesson 1.
      Control Flow
      Control flow in package
      Data Flow
      Data flow in package

      To test the Lesson 2 tutorial package

      1. In Solution Explorer, right-click Lesson 2.dtsx and click Execute Package.
        The package will run. You can verify the status of each loop in the Output window, or by clicking on the Progress tab. For example, you can see that 1097 lines were added to the destination table from the file Currency_VEB.txt.
      2. After the package has completed running, on the Debug menu, click Stop Debugging.


         

         

       

     

2 comments:

  1. How do you "delete the records from dbo.FactCurrency in AdventureWorksDW2012"?

    ReplyDelete
  2. I was able to delete records with this query:

    DELETE
    FROM [AdventureWorksDW2012].[dbo].[FactCurrencyRate]

    I then tested Lesson 2 and it ran ok with no errors.

    ReplyDelete