OLAP Course Intro
Dimensions and facts: measures
Measure groups
ETL : fuzzy lookup transformation, logging execution results, placing checkpoints.
Define cube storage options.
KPI: orange,yellow,red.
Process and deploy a cube.
Why send a mailer selling a van to single people?
Most important: data-mining, prediction, if i follow this trend where i will be? SQL Server comes with 7 algorythms.
Use report manager to deply reports.
VVVVVVVVVVVVVVVVVVVVVVVVVVVV
Follow-up:
MDX to pull data from cube? Using SSRS? A:Yes. Only MDX pulls from cube.
Can storage be changed from MOLAP to HOLAP to ROLAP? (So some OLAP cube data is left in RDMS?) A: Dont have to rebuild, but may need to restart or redeploy.
Cubes are “read-only”, but sometimes users can writeback to update the cube? A: 99% of time read-only. Must spec in in build of cube you want writeback and all writebacks go to one table.
Datamart = cube, or a type of data in a group of cubes?
“Deploy” a cube?
“Deadlocks”? Long locks hold up the database, but deadlocks don’t let the table go free for other writes. Possibly due to two tables referencing each other. SQL Profiler helps show deadlocked processes….bugs in sql code.
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
Cubes are denormalized to speed up processing – as decision re what to include has already narrowed the bulk down.
Download SQL2008:
http://www.mhprofessional.com/downloads/products/0071549447/0071549447_Code.zip
Download databases:
See sqldownloadinstructions.txt in S: drive.
http://bellevuecollege.edu/lmc/media_reserve.html
What is BI? Method of storing and presenting key enterprise data to make decisions and forecast future results. Should we add more machines, or more sales people? Are we getting more requests or more customers? Less in some areas or for some products?
All the data must be converted into a cube periodically, so data in reports is not stale. Must collect data, and run ETL from one source into the cube. RDMS, Staging server, BI Cube (unified location of read-only data warehouse full of datamarts/cubes), Deploy reports to application server (LAN users) or a web server (WAN users). In some cases users can writeback to the BI server cube. IIS and web technologies should be separate from SQL server technologies.
XMLA: copying, backups, viewing metadata of your cube. Useful for giving admins metadata instructions to deploy cubes.
No locks needed on OLAP tables.
Snapshots, realtime, or cached queries.
Rapidly changing data (dollar amounts) vs slowly changing data.
OLAP Modelling
Schema: definition of data content and structure of tables.
Dimensions are the ways to slice your data, each possibly having multiple attributes. Do not look at relational tables to build dimension tables in your cube, look at the needs of user reports. Not all attributes need to be included.
FACT TABLE: aggregations of the facts/measures: units made, sales figures, and keys to dimension tables.
Star schema: one fact table, multiple dimension tables. Note that some cubes have two or more fact tables: resellersales and internetsales. (a not-so-strict-star schema) Tables are “flat”: no subcategories.
>>>>>>>>>>>non-relational cube tables duplicate data?
>>>>>>>>>>> none of the aggregations in a fact table are hierarchical groupings?
Snowflake schema: if goal is multiple drill down levels, hierarchical layers, then a snowflake schema is needed.
1. Grain statements: To being designing a cube, work from end to beginning, customer executive back to data needed. Start with “grain statements” : 1) What are key metrics, top interests, measures? 2) by what factors are key interests evaluated? by store, by area, by date? 3) by what level does each factor need to be evaluated? by store or city or region? by day or month?
With UDM, cubes can be built straight from the source, without a staging database.
>>>>>>>>>>>>>What determines the dimensions loaded by the client? “Significantly different from SSAS 2000″
2. How to maintain data, handle the changes, tweeking storage options: Update options are applicable to SCD and RCD’s. Must decide which dimensions are which, to determine how updates to cube are made.
>>>>>>>>>>>processing = ETL, aggregation, calculated measures?
Just the measures from a fact table (no keys) = the measure group, making report access easier as it will be in memory. multiple measure groups can be designed for a cube. eg,just the measures for 2005.
Action Modelling: Designing a trigger based on condition when building a cube that gives users the ability to drill-down on a cell.
Perspectives: Like views.
Translations: Gives the ability to pickup browser locale and give manually translated text for field names. Or set the format.
OLAP Course Class 2
Create a new vs ssas project, and a db, both named “MaxMinSalesDM”. Use the Simple db recovery option.
Sometimes the Time dimension doesn’t need to be created on its own because other tables may already have time-related data in them.
Create a cube, choose ‘Generate a schema for every row’. Starts the wizard to help you define attributes for the dimension tables.
4 Items that create a cxn: Server,Database,Authentication,Provider (Oracle,SQLExpress,SQLServer,etc.)
Windows authentication will take the currently logged on person. Impersonation will always use the set user account and pwd instead of the logged on user. 4 Impersonation options: ‘SQLServer authentication’ needs to be allowed on the sql server, and an account needs to be created.
“Use the Service account” is NTAuthority\System which has higher privileges than a guest. VVVVV Need sql admin to allow this, or creating cube configures this? “Use the credentials of the current user” will work when the Ops or SQL Admins have given you access. “Inherit” permissions of “caller” VVVV
>>>>A view is a saved Select statement?
Design and Build a Cube:
1.Create Dim and Fact tables. 2.ETL fill tables 3. Design the cube: Calcs,KPI’s,Actions: Write the statements defining the code 4.Process (compile and do calculations) and deploy the cube to a SSAS cube.
Snowflake schema has hierarchies.
SSIS:
Control Flow is made of ‘buckets of tasks’, prebuilt or custom built, some having data flow actions and others run a script, and others ftp some files or email someone.
Connection Managers define the interfacing of the package.
“External Column” is fieldname in source, “Output Column” is the … as, what you might want to name it.
Red output arrow is ‘what to do if this task throws an error’.
Employee$ is what? vs Employee ?
>>>>>>A “named query”?
>>>>>>create and consume flat files into/from where
A split transform: Default output is all data outside conditions, the leftovers that you go to a different destination.
Make a habit of rendering the Mappings before filling a Preview.
VVVVVVVVVVVVVVV
- “The selected data source is on a remote computer. The bulk insert operation can be executed only on the local coputer”.???
- MaxMinManufacturingDM_Db_structure.sql Can this be autocreated in SSMS?
OLAP Course Class 3
Named queries/calculation pulls data from source to create extra column.
Calculated measures = measures that do not exist in the fact source, but created when processed. They sit on top of measures in the cube, and unless processed, you won’t get the extra columns and the aggregations that fill them.
Dimension tab shows which Dim PK’s feed which Measure tables.
“Non-empty” behaviour
Measure groups require processing power – delete any unnecessary ones.
Measure group = only the measures, only the facts – no key pointers
“Granularity Level” = SSAS term for fk/pk relationships (shown on Dim tab).
If column name is unique, no need to clarify with [Measure Name] or [Dimension Name], no need for Fully qualified naming.
ROLAP = all low level details and aggregations are stored in relational DM, only structure of cube is in SSAS. Running report pulls “live”/”real-time” data from the DM. Report will be delayed as cube processes/fills, but data is least stale. It’s as up-to-date as the SSIS jobs putting data into the DM. ROLAP good for Rapidly changing dimensions.
MOLAP = Multidimensional OLAP. Schema of cube, and some pre-aggregations levels will be calculated, eg, product code. Not all levels, but some, the most commonly used in reports. All data as well is being pulled from the cube in SSAS as well. Processing is not real-time, but reports will be faster than ROLAP storage.
HOLAP = Hybrid. Top-level aggregates are stored in SSAS, but low-level data when needed by a report drill-down, will pulled as needed.