Saturday, October 31, 2015

 
 
Lesson 1: Creating the Project and Basic Package
 
In this lesson, you will create a simple ETL package that extracts data from a single flat file, transforms the data using lookup transformations and finally loads the result into a fact table destination.


Applies To: SQL Server 2016 Preview

In this lesson, you will create a simple ETL package that extracts data from a single flat file source, transforms the data using two lookup transformation components, and writes that data to the FactCurrency fact table in AdventureWorksDW2012. As part of this lesson, you will learn how to create new packages, add and configure data source and destination connections, and work with new control flow and data flow components.

 System_CAPS_importantImportant
This tutorial requires the AdventureWorksDW2012 sample database. For more information on installing and deploying AdventureWorksDW2012, see Reporting Services Product Samples on CodePlex.
___________________________________________________________________________

Understanding the Package Requirements

This tutorial requires Microsoft SQL Server Data Tools.
For more information on installing the SQL Server Data Tools see SQL Server Data Tools Download.
Before creating a package, you need a good understanding of the formatting used in both the source data and the destination. Once you understand both of these data formats, you will be ready to define the transformations necessary to map the source data to the destination.

Looking at the Source

For this tutorial, the source data is a set of historical currency data contained in the flat file, SampleCurrencyData.txt. The source data has the following four columns: the average rate of the currency, a currency key, a date key, and the end-of-day rate.
Here is an example of the source data contained in the SampleCurrencyData.txt file:

1.00070049USD9/3/05 0:001.001201442
1.00020004USD9/4/05 0:001
1.00020004USD9/5/05 0:001.001201442
1.00020004USD9/6/05 0:001
1.00020004USD9/7/05 0:001.00070049
1.00070049USD9/8/05 0:000.99980004
1.00070049USD9/9/05 0:001.001502253
1.00070049USD9/10/05 0:000.99990001
1.00020004USD9/11/05 0:001.001101211
1.00020004USD9/12/05 0:000.99970009

When working with flat file source data, it is important to understand how the Flat File connection manager interprets the flat file data. If the flat file source is Unicode, the Flat File connection manager defines all columns as [DT_WSTR] with a default column width of 50. If the flat file source is ANSI-encoded, the columns are defined as [DT_STR] with a column width of 50. You will probably have to change these defaults to make the string column types more appropriate for your data. To do this, you will need to look at the data type of the destination where the data will be written to and then choose the correct type within the Flat File connection manager.

Looking at the Destination

The ultimate destination for the source data is the FactCurrency fact table in AdventureWorksDW. The FactCurrency fact table has four columns, and has relationships to two dimension tables, as shown in the following table.

Column NameData TypeLookup TableLookup Column
AverageRatefloatNoneNone
CurrencyKeyint (FK)DimCurrencyCurrencyKey (PK)
DateKeyInt (FK)DimDateDateKey (PK)
EndOfDayRatefloatNoneNone

Mapping Source Data to be Compatible with the Destination

Analysis of the source and destination data formats indicates that lookups will be necessary for the CurrencyKey and DateKey values. The transformations that will perform these lookups will obtain the CurrencyKey and DateKey values by using the alternate keys from DimCurrency and DimDate dimension tables.
Flat File ColumnTable NameColumn NameData Type
0FactCurrencyAverageRatefloat
1DimCurrencyCurrencyAlternateKeynchar (3)
2DimDateFullDateAlternateKeydate
3FactCurrencyEndOfDayRatefloat

Lesson Tasks

This lesson contains the following tasks:
  • Step 1: Creating a New Integration Services Project
  • Step 2: Adding and Configuring a Flat File Connection Manager
  • Step 3: Adding and Configuring an OLE DB Connection Manager
  • Step 4: Adding a Data Flow Task to the Package
     
  • Step 5: Adding and Configuring the Flat File Source
  • Step 6: Adding and Configuring the Lookup Transformations
  • Step 7: Adding and Configuring the OLE DB Destination
  • Step 8: Making the Lesson 1 Package Easier to Understand
  • Step 9: Testing the Lesson 1 Tutorial Package

**************************************************************************************

Step 1: Creating a New Integration Services Project

SQL Server 2016
 Applies To: SQL Server 2016 Preview
The first step in creating a package in Integration Services is to create an Integration Services project. This project includes the templates for the objects — data sources, data source views, and packages — that you use in a data transformation solution.
The packages that you will create in this Integration Services tutorial interpret the values of locale-sensitive data. If your computer is not configured to use the regional option English (United States), you need to set additional properties in the package. The packages that you use in lessons 2 through 5 are copied from the package created in lesson 1, and you need not update locale-sensitive properties in the copied packages.

System_CAPS_noteNote
This tutorial requires Microsoft SQL Server Data Tools.
For more information on installing the SQL Server Data Tools see SQL Server Data Tools Download.

To create a new Integration Services project

  1. On the Start menu, point to All Programs, point to Microsoft SQL Server, and click SQL Server Data Tools.
  2. On the File menu, point to New, and click Project to create a new Integration Services project.
  3. In the New Project dialog box, expand the Business Intelligence node under Installed Templates, and select Integration Services Project in the Templates pane.
  4. In the Name box, change the default name to SSIS Tutorial. Optionally, clear the Create directory for solution check box.
  5. Accept the default location, or click Browse to browse to locate the folder you want to use. In the Project Location dialog box, click the folder and click Select Folder.
  6. Click OK.
    By default, an empty package, titled Package.dtsx, will be created and added to your project under SSIS Packages.
  7. In Solution Explorer toolbar, right-click Package.dtsx, click Rename, and rename the default package to Lesson 1.dtsx.

Next Task in Lesson

Step 2: Adding and Configuring a Flat File Connection Manager

SQL Server 2016
Other Versions

 Applies To: SQL Server 2016 Preview
In this task, you add a Flat File connection manager to the package that you just created. A Flat File connection manager enables a package to extract data from a flat file. Using the Flat File connection manager, you can specify the file name and location, the locale and code page, and the file format, including column delimiters, to apply when the package extracts data from the flat file. In addition, you can manually specify the data type for the individual columns, or use the Suggest Column Types dialog box to automatically map the columns of extracted data to Integration Services data types.
You must create a new Flat File connection manager for each file format that you work with. Because this tutorial extracts data from multiple flat files that have exactly the same data format, you will need to add and configure only one Flat File connection manager for your package.
For this tutorial, you will configure the following properties in your Flat File connection manager:
  • Column names:   Because the flat file does not have column names, the Flat File connection manager creates default column names. These default names are not useful for identifying what each column represents. To make these default names more useful, you need to change the default names to names that match the fact table into which the flat file data is to be loaded.
  • Data mappings:   The data type mappings that you specify for the Flat File connection manager will be used by all flat file data source components that reference the connection manager. You can either manually map the data types by using the Flat File connection manager, or you can use the Suggest Column Types dialog box. In this tutorial, you will view the mappings suggested in the Suggest Column Types dialog box and then manually make the necessary mappings in the Flat File Connection Manager Editor dialog box.
The Flat File connection manager provides locale information about the data file. If your computer is not configured to use the regional option English (United States), you must set additional properties in the Flat File Connection Manager Editor dialog box.

To add a Flat File connection manager to the SSIS package

  1. Right-click anywhere in the Connection Managers area, and then click New Flat File Connection.
  2. In the Flat File Connection Manager Editor dialog box, for Connection manager name, type Sample Flat File Source Data.
  3. Click Browse.
  4. In the Open dialog box, locate the SampleCurrencyData.txt file on your machine.
    The sample data is included with the SSIS lesson packages. To download the sample data and the lesson packages, do the following.
    1. Navigate to Integration Services Product Samples
    2. Click the DOWNLOADS tab.
    3. Click the SQL2012.Integration_Services.Create_Simple_ETL_Tutorial.Sample.zip file.
  5. Clear the Column names in the first data row checkbox.

To set locale sensitive properties

  1. In the Flat File Connection Manager Editor dialog box, click General.
  2. Set Locale to English (United States) and Code page to 1252.

To rename columns in the Flat File connection manager

  1. In the Flat File Connection Manager Editor dialog box, click Advanced.
  2. In the property pane, make the following changes:
    • Change the Column 0 name property to AverageRate.
    • Change the Column 1 name property to CurrencyID.
    • Change the Column 2 name property to CurrencyDate.
    • Change the Column 3 name property to EndOfDayRate.
    System_CAPS_noteNote
    By default, all four of the columns are initially set to a string data type [DT_STR] with an OutputColumnWidth of 50.

To remap column data types

  1. In the Flat File Connection Manager Editor dialog box, click Suggest Types.
    Integration Services automatically suggests the most appropriate data types based on the first 200 rows of data. You can also change these suggestion options to sample more or less data, to specify the default data type for integer or Boolean data, or to add spaces as padding to string columns.
    For now, make no changes to the options in the Suggest Column Types dialog box, and click OK to have Integration Services suggest data types for columns. This returns you to the Advanced pane of the Flat File Connection Manager Editor dialog box, where you can view the column data types suggested by Integration Services. (If you click Cancel, no suggestions are made to column metadata and the default string (DT_STR) data type is used.)
    In this tutorial, Integration Services suggests the data types shown in the second column of the following table for the data from the SampleCurrencyData.txt file. However, the data types that are required for the columns in the destination, which will be defined in a later step, are shown in the last column of the following table.
    Flat File ColumnSuggested TypeDestination ColumnDestination Type
    AverageRatefloat [DT_R4]FactCurrency.AverageRatefloat
    CurrencyIDstring [DT_STR]DimCurrency.CurrencyAlternateKeynchar(3)
    CurrencyDatedate [DT_DATE]DimDate.FullDateAlternateKeydate
    EndOfDayRatefloat [DT_R4]FactCurrency.EndOfDayRatefloat
    The data type suggested for the CurrencyID column is incompatible with the data type of the field in the destination table. Because the data type of DimCurrency.CurrencyAlternateKey is nchar (3), CurrencyID must be changed from string [DT_STR] to string [DT_WSTR]. Additionally, the field DimDate.FullDateAlternateKey is defined as a date data type; therefore, CurrencyDate needs to be changed from date [DT_Date] to database date [DT_DBDATE].
  2. In the list, select the CurrencyID column and in the property pane, change the Data Type of column CurrencyID from string [DT_STR] to Unicode string [DT_WSTR].
  3. In the property pane, change the data type of column CurrencyDate from date [DT_DATE] to database date [DT_DBDATE].
  4. Click OK.

Next Task in Lesson

Step 3: Adding and Configuring an OLE DB Connection Manager

SQL Server 2016
Other Versions
 Applies To: SQL Server 2016 Preview
After you have added a Flat File connection manager to connect to the data source, the next task is to add an OLE DB connection manager to connect to the destination. An OLE DB connection manager enables a package to extract data from or load data into any OLE DB–compliant data source. Using the OLE DB Connection manager, you can specify the server, the authentication method, and the default database for the connection.
In this lesson, you will create an OLE DB connection manager that uses Windows Authentication to connect to the local instance of AdventureWorksDB2012. The OLE DB connection manager that you create will also be referenced by other components that you will create later in this tutorial, such as the Lookup transformation and the OLE DB destination.

To add and configure an OLE DB Connection Manager to the SSIS package

  1. Right-click anywhere in the Connection Managers area, and then click New OLE DB Connection.
  2. In the Configure OLE DB Connection Manager dialog box, click New.
  3. For Server name, enter localhost.
    When you specify localhost as the server name, the connection manager connects to the default instance of SQL Server on the local computer. To use a remote instance of SQL Server, replace localhost with the name of the server to which you want to connect.
  4. In the Log on to the server group, verify that Use Windows Authentication is selected.
  5. In the Connect to a database group, in the Select or enter a database name box, type or select AdventureWorksDW2012.
  6. Click Test Connection to verify that the connection settings you have specified are valid.
  7. Click OK.
  8. Click OK.
  9. In the Data Connections pane of the Configure OLE DB Connection Manager dialog box, verify that localhost.AdventureWorksDW2012 is selected.
  10. Click OK.

Next Task in Lesson

Step 4: Adding a Data Flow Task to the Package

SQL Server 2016
Other Versions

 Applies To: SQL Server 2016 Preview
After you have created the connection managers for the source and destination data, the next task is to add a Data Flow task to your package. The Data Flow task encapsulates the data flow engine that moves data between sources and destinations, and provides the functionality for transforming, cleaning, and modifying data as it is moved. The Data Flow task is where most of the work of an extract, transform, and load (ETL) process occurs.
System_CAPS_noteNote
SQL Server Integration Services separates data flow from control flow.

To add a Data Flow task

  1. Click the Control Flow tab.
  2. In the SSIS Toolbox, expand Favorites, and drag a Data Flow Task onto the design surface of the Control Flow tab.
    System_CAPS_noteNote
    If the SSIS Toolbox isn’t available, on the main menu select SSIS then SSIS Toolbox to display the SSIS Toolbox.
  3. On the Control Flow design surface, right-click the newly added Data Flow Task, click Rename, and change the name to Extract Sample Currency Data.
    It is good practice to provide unique names to all components that you add to a design surface. For ease of use and maintainability, the names should describe the function that each component performs. Following these naming guidelines allows your Integration Services packages to be self-documenting. Another way to document your packages is by using annotations. For more information about annotations, see Use Annotations in Packages.
  4. Right-click the Data Flow task, click Properties, and in the Properties window, verify that the LocaleID property is set to English (United States).

Next Task in Lesson

Step 5: Adding and Configuring the Flat File Source

SQL Server 2016
 Applies To: SQL Server 2016 Preview
In this task, you will add and configure a Flat File source to your package. A Flat File source is a data flow component that uses metadata defined by a Flat File connection manager to specify the format and structure of the data to be extracted from the flat file by a transform process. The Flat File source can be configured to extract data from a single flat file by using the file format definition provided by the Flat File connection manager.
For this tutorial, you will configure the Flat File source to use the Sample Flat File Source Data connection manager that you previously created.

To add a Flat File Source component

  1. Open the Data Flow designer, either by double-clicking the Extract Sample Currency Data data flow task or by clicking the Data Flow tab.
  2. In the SSIS Toolbox, expand Other Sources, and then drag a Flat File Source onto the design surface of the Data Flow tab.
  3. On the Data Flow design surface, right-click the newly added Flat File Source, click Rename, and change the name to Extract Sample Currency Data.
  4. Double-click the Flat File source to open the Flat File Source Editor dialog box.
  5. In the Flat file connection manager box, select Sample Flat File Source Data.
  6. Click Columns and verify that the names of the columns are correct.
  7. Click OK.
  8. Right-click the Flat File source and click Properties.
  9. In the Properties window, verify that the LocaleID property is set to English (United States).

Next Task in Lesson

Step 6: Adding and Configuring the Lookup Transformations

SQL Server 2016

 Applies To: SQL Server 2016 Preview
After you have configured the Flat File source to extract data from the source file, the next task is to define the Lookup transformations needed to obtain the values for the CurrencyKey and DateKey. A Lookup transformation performs a lookup by joining data in the specified input column to a column in a reference dataset. The reference dataset can be an existing table or view, a new table, or the result of an SQL statement. In this tutorial, the Lookup transformation uses an OLE DB connection manager to connect to the database that contains the data that is the source of the reference dataset.
System_CAPS_noteNote
You can also configure the Lookup transformation to connect to a cache that contains the reference dataset. For more information, see Lookup Transformation.
For this tutorial, you will add and configure the following two Lookup transformation components to the package:
  • One transformation to perform a lookup of values from the CurrencyKey column of the DimCurrency dimension table based on matching CurrencyID column values from the flat file.
  • One transformation to perform a lookup of values from the DateKey column of the DimDate dimension table based on matching CurrencyDate column values from the flat file.
In both cases, the Lookup transformation will utilize the OLE DB connection manager that you previously created.

To add and configure the Lookup Currency Key transformation

  1. In the SSIS Toolbox, expand Common, and then drag Lookup onto the design surface of the Data Flow tab. Place Lookup directly below the Extract Sample Currency Data source.
  2. Click the Extract Sample Currency Data flat file source and drag the green arrow onto the newly added Lookup transformation to connect the two components.
  3. On the Data Flow design surface, click Lookup in the Lookup transformation, and change the name to Lookup Currency Key.
  4. Double-click the Lookup CurrencyKey transformation to display the Lookup Transformation Editor.
  5. On the General page, make the following selections:
    1. Select Full cache.
    2. In the Connection type area, select OLE DB connection manager.
  6. On the Connection page, make the following selections:
    1. In the OLE DB connection manager dialog box, ensure that localhost.AdventureWorksDW2012 is displayed.
    2. Select Use results of an SQL query, and then type or copy the following SQL statement:






      select * from (select * from [dbo].[DimCurrency]) as refTable
      where [refTable].[CurrencyAlternateKey] = 'ARS'
      OR
      [refTable].[CurrencyAlternateKey] = 'AUD'
      OR
      [refTable].[CurrencyAlternateKey] = 'BRL'
      OR
      [refTable].[CurrencyAlternateKey] = 'CAD'
      OR
      [refTable].[CurrencyAlternateKey] = 'CNY'
      OR
      [refTable].[CurrencyAlternateKey] = 'DEM'
      OR
      [refTable].[CurrencyAlternateKey] = 'EUR'
      OR
      [refTable].[CurrencyAlternateKey] = 'FRF'
      OR
      [refTable].[CurrencyAlternateKey] = 'GBP'
      OR
      [refTable].[CurrencyAlternateKey] = 'JPY'
      OR
      [refTable].[CurrencyAlternateKey] = 'MXN'
      OR
      [refTable].[CurrencyAlternateKey] = 'SAR'
      OR
      [refTable].[CurrencyAlternateKey] = 'USD'
      OR
      [refTable].[CurrencyAlternateKey] = 'VEB'
       
      
  • On the Columns page, make the following selections:
    • In the Available Input Columns panel, drag CurrencyID to the Available Lookup Columns panel and drop it on CurrencyAlternateKey.
    • In the Available Lookup Columns list, select the check box to the left of CurrencyKey.
    • Click OK to return to the Data Flow design surface.
    • Right-click the Lookup Currency Key transformation, click Properties.
    • In the Properties window, verify that the LocaleID property is set to English (United States) and the DefaultCodePage property is set to 1252.

To add and configure the Lookup DateKey transformation

  • In the SSIS Toolbox, drag Lookup onto the Data Flow design surface. Place Lookup directly below the Lookup Currency Key transformation.
  • Click the Lookup Currency Key transformation and drag the green arrow onto the newly added Lookup transformation to connect the two components.
  • In the Input Output Selection dialog box, click Lookup Match Output in the Output list box, and then click OK.
  • On the Data Flow design surface, click Lookup in the newly added Lookup transformation, and change the name to Lookup Date Key.
  • Double-click the Lookup Date Key transformation.
  • On the General page, select Partial cache.
  • On the Connection page, make the following selections:
  • In the OLE DB connection manager dialog box, ensure that localhost.AdventureWorksDW2012 is displayed. In the Use a table or view box, type or select [dbo].[DimDate].
  • On the Columns page, make the following selections:
  • In the Available Input Columns panel, drag CurrencyDate to the Available Lookup Columns panel and drop it on FullDateAlternateKey. In the Available Lookup Columns list, select the check box to the left of DateKey.
  • On the Advanced page, review the caching options.
  • Click OK to return to the Data Flow design surface.
  • Right-click the Lookup Date Key transformation and click Properties.
  • In the Properties window, verify that the LocaleID property is set to English (United States) and the DefaultCodePage property is set to 1252.

Next Task in Lesson

Step 7: Adding and Configuring the OLE DB Destination

Applies To: SQL Server 2016 Preview
Your package now can extract data from the flat file source and transform that data into a format that is compatible with the destination. The next task is to actually load the transformed data into the destination. To load the data, you must add an OLE DB destination to the data flow. The OLE DB destination can use a database table, view, or an SQL command to load data into a variety of OLE DB-compliant databases.
In this procedure, you add and configure an OLE DB destination to use the OLE DB connection manager that you previously created. 
  
To add and configure the sample OLE DB destination


  • In the SSIS Toolbox, expand Other Destinations, and drag OLE DB Destination onto the design surface of the Data Flow tab. Place the OLE DB destination directly below the Lookup Date Key transformation.
  • Click the Lookup Date Key transformation and drag the green arrow over to the newly added OLE DB Destination to connect the two components together.
  • In the Input Output Selection dialog box, in the Output list box, click Lookup Match Output, and then click OK.
  • On the Data Flow design surface, click OLE DB Destination in the newly added OLE DB Destination component, and change the name to Sample OLE DB Destination.
  • Double-click Sample OLE DB Destination.
  • In the OLE DB Destination Editor dialog box, ensure that localhost.AdventureWorksDW2012 is selected in the OLE DB Connection manager box.
  • In the Name of the table or the view box, type or select [dbo].[FactCurrencyRate].
  • Click the New button to create a new table. Change the name of the table in the script to read NewFactCurrencyRate. Click OK.
  • Upon clicking OK, the dialog will close and the Name of the table or the view will automatically change to NewFactCurrencyRate.
  • Click Mappings.
  • Verify that the AverageRate, CurrencyKey, EndOfDayRate, and DateKey input columns are mapped correctly to the destination columns. If same-named columns are mapped, the mapping is correct.
  • Click OK.
  • Right-click the Sample OLE DB Destination destination and click Properties.
  • In the Properties window, verify that the LocaleID property is set to English (United States) and the DefaultCodePage property is set to 1252.

Next Task in Lesson 
Applies To: SQL Server 2016 Preview
Now that you have completed the configuration of the Lesson 1 package, it is a good idea to tidy up the package layout. If the shapes in the control and data flow layouts are random sizes, or if the shapes are not aligned or grouped, the functionality of package can be more difficult to understand.
SQL Server Data Tools provides tools that make it easy and quick to format the package layout. The formatting features include the ability to make shapes the same size, align shapes, and manipulate the horizontal and vertical spacing between shapes.
Another way to improve the understanding of package functionality is to add annotations that describe package functionality.
In this task, you will use the formatting features in SQL Server Data Tools to improve the layout of the data flow and also add an annotation to the data flow.

To format the layout of the data flow

If the Lesson 1 package is not already open, double-click Lesson 1.dtsx in Solution Explorer.
Click the Data Flow tab.
Place the cursor to the top and to the right of the Extract Sample Currency transformation, click, and then drag the cursor across all the data flow components.
On the Format menu, point to Make Same Size, and then click Both.
With the data flow objects selected, on the Format menu, point to Align, and then click Lefts.

To add an annotation to the data flow

Right-click anywhere in the background of the data flow design surface and then click Add Annotation.
Type or paste the following text in the annotation box.
The data flow extracts data from a file, looks up values in the CurrencyKey column in the DimCurrency table and the DateKey column in the DimDate table, and writes the data to the NewFactCurrencyRate table.
To wrap the text in the annotation box, place the cursor where you want to start a new line and press the Enter key.
If you do not add text to the annotation box, it disappears when you click outside the box.

Next Steps 

Step 9: Testing the Lesson 1 Tutorial Package



SQL Server 2016
Other Versions
 Applies To: SQL Server 2016 Preview
In this lesson, you have done the following tasks:
  • Created a new SSIS project.
  • Configured the connection managers that the package needs to connect to the source and destination data.
  • Added a data flow that takes the data from a flat file source, performs the necessary Lookup transformations on the data, and configures the data for the destination.
Your package is now complete! It is time to test your package.

Before you test the package you should verify that the control and data flows in the Lesson 1 package contain the objects shown in the following diagrams.
Control Flow
Control flow in package
Data Flow
Data flow in package

To run the Lesson 1 tutorial package

  • On the Debug menu, click Start Debugging.
    The package will run, resulting in 1097 rows successfully added into the FactCurrency fact table in AdventureWorksDW2012.
  • After the package has completed running, on the Debug menu, click Stop Debugging.


No comments:

Post a Comment