This is the fifth part in a series of articles where I'm going to describe the internal workings of the StreamServer.
This part will focus on the post processor of the StreamServer Runtime Engine.
Disclaimer: The post processing features are currently being updated. The following is applicable to both StreamServer 4.x and Persuasion 5.0 to 5.4. I will revisit this article after Persuasion 5.5 has been released.
The Post-processor makes it possible to merge several data streams that were received asynchronously. To achieve this, the processing is divided into two phases. The first Phase is very similar to a “normal” StreamServe job execution. The result is output in form of a number of documents. These documents are stored in a database, the Post-processor Repository, for later processing. In the second phase the same or another server picks up the documents, groups them with other documents, does the final processing and sends them to the desired destination using an output connector. The Post-processor storage currently only supports Page oriented output.
The problem to be solved in phase I is very similar to a “normal” job. The difference is that instead of delivering the output to an output connector we write the output to the Post-processor storage. This is done by inserting a special type of device driver in the output pipeline. This device driver will write down objects, pages and documents to the storage. It will use an internal format that is a serialized version of the commands (meta records) sent from the PageOUT (and StoryTeller, XFAOUT) process to the drivers. This serialized format is called SDR.
It is also possible to add meta-data to the documents and to store variables for later use. The meta-data can later be used as search keys when the second phase should select which documents to pick up. The variables will be recreated in phase 2 and can there be used to hold the values for settings in the phase 2 components.
Note that in a job it is possible to send some output directly to a connector as in a “normal” job and some output to the Post-processor repository. It is also possible to send the output both to an output connector and to the Post-Processor Repository.
The second phase is where the actual post processing takes place. The operation of this phase differs significantly from the normal operation of the Communication Server.
The first phase has already been run when the second phase is invoked. This means that the Post-processor Repository contains stored documents. The task that should be fulfilled is to select a number of documents that should be mounted to one big output job, do the final processing of this job and finally send it to an output connector for delivery.
To select which documents that should be retrieved a database query must be used. The processing in this phase can be divided in the following steps:
- Receiving the query
- Executing the query
- Finalizing the output
- Deliver the output
The query is defined in a XML format called PPQ. The query contains information about which database to connect to and which documents to retrieve. The query can be created created in a tool called Post-processor Repository tool. With this tool the user can connect to a Post-processor repository, view the content and create queries using a graphical user interface. The meta-data stored together with the document can be used to select which documents to pick up.
Receiving the query
The component that executes the query is located in the Output pipeline and the xml document (PPQ) describing the query must be delivered to this component. To get the query into the server the input connectors are being used. Any input connector can be used to receive the query but normally the directory scan connector, the http connector or the Post-processor scheduler is used.
The Post-processor scheduler is an input connector written especially for the Post-processor. It takes a query in from of an xml file and sends it into the server according to a pre-defined schedule. The schedule is using the internal scheduler of the server and makes it possible to create complex schemas of when to send the query, e.g. every third Wednesday in each month. Any number of Post-processor schedulers can be used simultaneously in the same server. Each Post-processor scheduler connector can have different schedules and different queries.
From the Post-processor repository tool it is possible to send the loaded query to a server using http or file transfer (to a directory scan input connector). This is used when interactively starting the second phase.
Other input connectors may be used when an external application wants to send queries directly to the server without user intervention.
Executing the query
The StreamServer knows from the Execution Model that it is running a Post-processor job. The difference is that for Post-processor jobs, the kernel just bypasses the query to the output pipeline where a component called document broker has been inserted. The document broker component executes the query. It parses the xml query, connects to the database and retrieves the selected documents. The documents are stored in the repository as serialized driver commands (SDR). These commands are now deserialized and can be sent through the output pipeline.
Finalizing the output
The Post-processor has a number of formatting components that it can invoke. The Post-processor finds out which components to invoke from the Execution Model. Examples of components are components for sorting and enveloping. These components are being handled as a pipeline of components in the same way as the output pipeline and the input pipeline. After the data ha been sent through the document broker pipeline it is sent to the output pipeline.
The output pipeline processes the data in the same way as with a “normal” job. As one of the last components in the pipeline there is always a device driver that accepts the commands sent from the document broker and creates device formatted data.
For all settings the variables stored in the phase I can be used for component settings. The variables can be stored on per page level and thereby be used with unique value for each page on page level settings.
Delivering the output
As the final step in the post processing the output created is delivered to its destination using the output queue and the output connectors. This step does not differ from “normal” job processing. The variables stored with the documents can be used a connector parameters.
This article concludes the architectural overview of the StreamServer. It will probably be neccesary to revisit this topic in future articles, but we will now move on to the execution logic of the StreamServer next.
InDepth Part 6 - StreamServer Runtime Engine - Job Execution