Friday, July 24, 2009

The 10 Seconds Pentaho Metadata Editor Tutorial


Quick Step By Step Guide

You can find the main Pentaho Wiki doc here.
This is an updated (2013-08-19) and extended version of the original tutorial which I posted here a few years ago. It will take you more than 10 seconds now to read through it, but instructions should be more precise now than before.

Specifying JNDI and adding the JDBC driver



  1. Define a JDNI Connection. You have to use the same JNDI connection as you specified on the BI Server (in the Pentaho Administration Console).The JDBC details can be specified in the jdbc.properties file in the following directory:<pme-root-dir>/simple-jndi

    Add the following (amend to your requirements):
    dwh/type=javax.sql.DataSource
    dwh/driver=com.mysql.jdbc.Driver
    dwh/url=jdbc:mysql://localhost:3306/pentaho
    dwh/user=root
    dwh/password=
    The first part before the forward slash is the JDNI name (so in this case dwh).
  2. Check if the required JDBC driver is installed under<pme-root-dir>/libext/JDBC
    If not, download the JDBC driver from your vendor’s website and copy the jar file into this folder.

Importing the physical tables



  1. Start PME by running the following command in the PME root directory:sh ./metadata-editor.sh
  2. Click on File > Save and specify a good name for your metadata model.
  3. Right click on Connections on the top left hand side and choose New Connection .... Define your connection details, make sure you choose JNDI in the Access selection and specify the same JNDI name in the Settings section as you originally specified in the jdbc.properties file.
    Some interesting options are found in the
    Advanced section: If you work with a database like PostgreSQL or Oracle that support proper schemata, you can define the default schema to use here. Also, if you have columns of type boolean you can also enable support for them:

    After entering all the details, click the
    Test button to make sure that your connection details are correct. Once you get a successful return message, click OK.
  4. Right click on the database connection you used created choose Import Tables:
  5. Expand the database node (dwh in the screenshot below) so that you can see all the imported tables:
  6. Specify table properties: Double click on each table to specify various settings. In example specify if it is a fact or dimensional table. For measures configure as well the aggregation type. If you want to add calculated fields, it's time to do so now: Double click on the respective table. Once the window is open, click on the + icon. Give the field a new name (i.e. PC_Amount_of_users). Define the aggregation type (i.e. Count). Define Data Type. If you don't know the length and precision yet, set it to -1. Define the formula (if you have just a simple count or sum, then only write the name of the column in there). That's it. (Field type can stay on "other").
    In the formula field you can use database specific functions as well (i.e. "YEAR(date)"). In this case you have to click on "Is formula exact?".
    You can add other properties like text alignment or date mask by clicking on the + icon.

Understanding Is the Formula Exact?

You can create add columns which are based on native SQL fragments, in example:
((CURRENT_DATE - start_date)/30)::int + 1
You specify this in the Formula Value field:
If you tick Is the Formula Exact? this basically means that the PME engine will not try to interpret this SQL fragment but instead push it directly to the database.

The disadvantage of this approach is that you might end up using functions which are specific to your database, so the model will not be that easily portable to other DBs (in case you ever have to migrate it).

Another common use case is to add a measure in case your raw data table doesn’t have one:

Defining a business model


  1. Right click on Business Model and select New Business Model. Name it etc.
  2. Drag and drop the tables onto the main working area.
  3. Double click on the table and go to Model Descriptor. If the type is incorrectly set (or not applicable for this model) click on the overwrite icon and define the respective table type (fact or dimension). Click OK.
  4. In order to create relationships between tables, select the two tables while pressing down the CTRL key and then right click on the last table and choose New Relationship:
    Another way to do this, although not that convenient, is to right click on the work area and choose
    New Relationship:

Create a business view


  1. Once the business tables and relationships are established, we can create the business view. Right click on Business View and select New Category. An easier way to do this is to choose Tools > Manage Categories (or right click the Category Editor icon in the toolbar). This will bring up the Category Editor dialog: Just click the + icon to add new categories. Define Categories, i.e. Date, Measures, Countries etc. Categories are basically buckets that help you organize the various business columns.
  2. Next we want to assign business columns to each category. If you created your categories in the tree view, right click on Business View and choose Manage Categories. Once in the Category Editor, use the arrows to move the fields into the categories.

Testing the metadata model

Now that the main metadata model is defined ... it is time to test the model. Click on the Query Builder icon in the toolbar and run some test queries. You can check the generated SQL by clicking on the SQL icon.

Publish the metadata model to the Pentaho BI Server

If testing is successful, publish the model to the BI server (Click on File > Publish To Server ...). The final metadata model is saved as an XMI file. On the BI Server, there can be only one XMI file per solution folder. Make sure that  


  • You have an account on the BI server
  • The publisher password on the BI Server is set up and you know it.
  • You know the name of the solution folder that you should publish the model to.
  • The URL to publish to is something like http://localhost:8080/pentaho/RepositoryFilePublisher
    Make sure you have
    RepositoryFilePublisher at the end of this URL!

Tips and tricks

Make use of Concepts

In the toolbar you can find the Concept Editor icon. Concepts are pretty much like CSS style definitions. Concepts can be assigned to each data point and are used in the final reports as default formats. One of the most important properties is probably Mask for Number or Date, which allows you to enter standard formatting strings (e.g. #,###.00).

To assign a Concept simply right click on the data point and choose Assign Parent Concept.

Referencing the same table more than once

In case one dimensional table is referenced more than once by your fact table, just drop the dimensional table several times into the business view and rename each of them. Then create separate relationships for all of them.

How to create formulas

Take a look at Pentaho Wiki for an introduction.

How to implement data security



  1. Click on your business model. Go to Tools > Security and import the roles from the Pentaho BI Server by entering following URL (amend if necessary):http://localhost:8080/pentaho/ServiceAction
    This will allow you to restrict the data for certain roles. If the connection works ok, you will see an XML extract of the roles definition.
  2. Go to your Business Model (this is one hierarchy below Business Models and has a brown briefcase symbol next to it) and right click, choose Edit. It is important that this is implemented on this level as otherwise it won't work.
  3. In the Metadata Security section add all the users/groups that you want to allow access. Assign the Update right to users that can save and edit their ad-hoc reports.
In the Data Constraint section add the users/groups and specify their access rights. If you want to restrict the access to a certain value of a specific column (row level security), specify it as follows: In this example we restrict the access to the country ITALY:[BT_SN_LT_COUNTRIES_SN_LT_COUNTRIES.BC_SN_LT_COUNTRIES_COUNTRY_NAME]="ITALY"
Also, if one user/group should have access to everything, you have to set the constraint to TRUE().

Thursday, July 23, 2009

Pentaho Xactions: Prepare your data for the Google Visualization

Pentaho Xactions: Prepare your data for the Google Visualization API


We want to create a line chart like the one shown in http://code.google.com/apis/visualization/documentation/gallery/linechart.html . Take a good look at the code below the chart, this is the one that we will try to prepare here.

Let's kick off with a simple MDX query that will display the amount of new registrations for the period 2009 Week 10 to Week 22:

SELECT 
[Country].Children
ON COLUMNS,
{[Registration Time Weekly].[2009].[10]:[Registration Time Weekly].[2009].[22]}*{[Measures].[New Registrations]}
ON ROWS
FROM
[Users]


I created a "Get Data From OLAP" step in my Xaction in the Pentaho Design Studio and inserted this query plus specified all other necessary parameters.

This step will bring back a query result. Name the result "query_result". To check what it looks like, you can drag and drop the "query_result" into the "Process Outputs" area. Save the file and execute it on the Pentaho BI Server. Now you will get a vague idea how the data looks like.

Now, let's write some JavaScript. Create a new step by choosing "Get data from > JavaScript". For the script inputs choose our "query_result". Now paste following JavaScript code into the main text area.

It is important to understand that x and y axis titles are stored in the metadata object. As we need these titles for the Google Chart definition, we will have to read them out. 
In the y axis, we have actually two titles, the first one being "New Registrations" and the second one being the week numbers. We will have to add the week numbers to our main dataset that we pass on to the Google chart definition. We will use getRowHeaders() and a loop to read out these titles. It is worth mentioning that in the array the week numbers will be in the first column and "New Registrations" in the second column (so actually in the opposite way as it is displayed). 

Let's start with our JavaScript:

var thedata="";

// we get a simple row and column count of the data set
rowCount=query_result.getRowCount();
colCount=query_result.getColumnCount();

var meta = query_result.getMetaData();

// get the column headers
var colHeaders = meta.getColumnHeaders(); //getColumnHeaders() returns object[][]

var colName="";

// create the code for the google chart definition
var i = 0;
for ( i = 0 ; i < colCount ; i++ ){
   colName+="data.addColumn('number','"+colHeaders[0][i]+"');\n"
     // use colName for something
}

var rowHeaders = meta.getRowHeaders(); // returns object[][]

var rowName="";

var i = 0;
for ( i = 0 ; i < rowCount ; i++ ){
// get the second row of the array as there are two dimensions in the y axis
   rowName+=rowHeaders[i][0]+"\n"
}


It is worth testing the x and y axis title output separately before going ahead. Make sure that everything looks fine! This is the first part. Define calName as a script output, then drag and drop "colName" from the "Process Actions" area to the "Process Outputs" area. Save everything and execute the Xaction on the BI Server. You will see that we have now created some part of the code that we will use in the Google Visulatization Chart definition. Do the same test for "rowName", and if everything is fine, carry on with this code:

// loop trough each row and column
for(row=0; row<rowCount; row++)
{
thedata+="data.setValue("+row+",0,'"+rowHeaders[row][0]+"');\n"
    for(column=0; column<colCount; column++)
    {
        // check if it is not a number
        if(isNaN(query_result.getValueAt(row,column))) {
        // now we prepare the string for that our Google Visualization Chart expects - > for text
        // Important: the column index has to be increase by one as we have made the week axis title as our first column!
        thedata+= "data.setValue("+row+","+(column+1)+",'"+query_result.getValueAt(row,column)+"');\n"
        }
        else
        {
        // now we prepare the string for that our Google Visualization Chart expects - > for numbers
        // Important: the column index has to be increase by one as we have made the week axis title as our first column!
        thedata+= "data.setValue("+row+","+(column+1)+","+query_result.getValueAt(row,column)+");\n"
        }
    }
}

Test the output for "thedata" and check if it is in line with the Google Chart definition.

Now, let's create a Message template to prepare the final html output. Define colName, rowCount and thedata as input. Insert this code into the main text area:

<html>
  <head>
    <script type="text/javascript" src="http://www.google.com/jsapi"></script>
    <script type="text/javascript">

//-------------------- Column Chart ------------------//
      google.load("visualization", "1", {packages:["linechart"]});
      google.setOnLoadCallback(drawChart);
      function drawChart() {
        var data = new google.visualization.DataTable();
        data.addColumn('string', 'Week');
// get column heads from dataset metadata
{colName}
        data.addRows({rowCount});

{thedata}

        var chart = new google.visualization.LineChart(document.getElementById('chart_div'));
        chart.draw(data, {width: 400, height: 240, legend: 'bottom',  title: 'New Registrations By Channel Last 7 Days'});
      }
    </script>
  </head>

  <body>
<div id="chart_div" style="float:left;clear:both"></div>

  </body>
</html>


Then define the output name as "out". Drag and drop it into the Process Outputs area (make sure you delete any other outputs before), then click on out in the Process Outputs area and change the "Name" from "out" to "content" in the Output Destination.

That's it. Save everything and then you should have a fancy Google line chart working.

You can download the file from here (it's not exactly the same, but quite similar).

Please note: If you are using SQL, the JavaScript has to be different. I provide one example using SQL here.