You are on page 1of 13

Waiting for a file and processing it

One of the perennial issues developers have is that they want to wait for incoming files in a directory, and when those files arrive, to process each one in a scenario. ODI provides tools to enable you to wait for files (OdiWaitForFile tool) and to take an action after that. The complexity comes when you want to wait for files (many of which may come in at a time) and process each individually. In order to address this issue, we need to add a little cunning to our approach. The starting point is to create a new ODI procedure, with a set of options:

This procedure will be used in the package which we create. The package will have three steps: An OdiWaitForFile step, which polls the appropriate directory, waiting for files to be ready to process. (Note that it is best practice when moving files, especially when receiving files transmitted using FTP to actually move two files, the file you want and an associated marker file, which is moved AFTER the transmit of the actual file. This technique gets round the problem caused by files created with a slow write being processed before they are ready. The marker file may contain the name of the file you want, or be named similarly, or only be transmitted on completion of all files in the transmission) Process Waiting Files is the procedure we create to be able to deal with the file(s) which need to be processed. A Start Self step , which initiates the execution of the whole Wait and process package. (the reason we dont just loop is so that we can see that task has run, and potentially purge the log of completed executions)

In the Package this image shows the OdiFileWait parameters. In this case I am waiting on the d:/Temp directory for all files with a .zip extension. I have told ODI not to do anything with the file (Action: NONE) although it may be an idea to move the file to a processing directory. In this case, I have not used any of the other options, where I can for instance specify the number of files to wait for (I might want to process batches).

When ODI has detected files which match the criteria specified, it will move on to the next step in the process, which is my procedure to process the incoming files. This process has the set of parameters shown below, which I need to fill in:

Of the options shown on this one, the PATTERN is for the files that we actually want to process not those we are waiting on (which may be different). SCEN_NAME is the name of the scenario to be executed for each of the files SYNC_MODE is 1 for Synchronous and 2 for Asynchronous. Be careful with this one, as if you execute asynchronously, you will get as many concurrent executions of the scenario as there are files. If the tasks within the scenario have not been specifically modified to allow for concurrent execution, you may have problems. (by default for instance all the temporary tables for a particular interfaces will use the same name) INCOMING_DIRECTORY is where the files for processing are located. This will be the same as the first step, unless you chose the MOVE option, to move the files to a separate directory

The last step I have put in the package is the execution of itself, asynchronously. This is using the SESS_NAME as the name of the scenario to start. This may not be correct if the original scenario was started using a NAME= parameter.

Procedure Detail
This is the detail of the procedure I created. As you can see it has only four steps. It could be done in less, but this makes the process more readable.

Figure 1 The steps of the file processing procedure

The first of the steps is the one to create the temporary table I will use to store the names of the file to process. I have put this table in the Sunopsis Memory Engine, an inmemory database functionality which is part of the ODI product. In this first step, as I only have one command, I put it in the Command on Target tab. The first part of this gets us the JDBC connection we will use, and to get the parameters of that, on the Command on Source tab, I have set the parameters of the database I wish to use (see the following image) As the table will be created in memory in the execution agent, and it should be disposed of on completion of the session, there should be little chance of this taking up too much memory. I have also created it with the sessionid as well as the filename, in case.

Figure 2 Command on target for the create table step of the procedure

Figure 3Command on Source for the create table step of the procedure

Next is the step to retrieve the list of files, and insert them into the newly created table (in memory). Here we are using some native functionality of the Jython scripting environment. The glob.glob(filepattern) will return a collection of the names, which we can then use to insert into the database with the INSERT statements.

Figure 4Command on Source for the Retrieve file list step of the procedure

Command on Source for the step which retrieves the file names from the table. Note that we set the Technology and Schema here to match the memory engine, pre-defined in Topology.

Figure 5 Execute Scenario for each file step "Command on Source"

For the Command on Target we execute the OdiTool command OdiStartScen once for each of the files retrieved on the Command on Source. Note that to get it to substitute the value of FILENAME in the resultset from the source command, we use the # prefix. Note also that the name of the variable I am passining in to each of the scenarios is here set as ORACLE.FileToBeProcessed. This implies that I have a project with a code ORACLE, and the variable is called FileToBeProcessed. It might be better to use a Global Variable, to eliminate the need to edit this in the procedure, and a variable FileToBeProcessed needs to be created. This does suppose that the scenario you are starting DECLAREs this variable, which may then be used in the scenario including in resource names (file names) etc

Figure 6 Execute Scenario for each file step "Command on Target"

The last Step in the procedure is just a tidy-up step, to drop the table I created earlier. As there is only one command, this goes on the Command on Target tab.

Figure 7 Last step, Drop Table, Clean up after yourself

Appendix 1: Getting the value of parameters passed to a scenario


One last extra little tidbit which may be useful: if you are passing variables into a scenario and want to know that those variables have been set, by default these are not shown in the log. A couple of workarounds exist to get that information: 1) Put a tool step in your package, something like an OdiSleep, and modify the code to be executed as illustrated:

The result of which will show in the log as follows:

2) The second option is to do a similar thing, but to put the code into a procedure as follows:

As you can see, here I simply made a Java BeanShell step and put the commented (/* */) code into there, so it is ignored by the interpreter.

You might also like