Leave feedback
  • Discussion

    Statistics for StreamServe projects

Enter a new topic
  • Jesper Werge Jesper Werge StreamServe Employee
    0 likes 19881 views

    Hi all,

    I got a request to write a theoretic solution for how to create a "live" statistic system that support batch and ad-hoc document creation statistics. 

    It should be fairly simple so it could be implemented into a lot of different StreamServe project scenarios, I do know that it will be difficult to achieve the needed flexibility so all StreamServe Project can deliver to such a system, but it is the goal.

    For batch jobs (1 file - many documents) I see that a job begin and jobend timestamp for a given inputfile +  an unique ID for the job, a total count of documents, a count for each type of output (ex. PCL, AFP, Email, SMS etc.) (could hold doc count/page count). also a record of each document within the job with some kind of unique ID for the document, for later trace.

    For ad-hoc documents, I have no idea how to manage at this point, but perhaps just a jobtype field would do.

    I have a request that a given document should be checked if it is delivered into the archive as well as if it is printed (as far as StreamServe is able to know). so some kind of trace should be build in.

    I have no restrictions on Database usage or other software/requirements for StreamServe Projects.

    I will develop a interface for the statistic system, properly a web application either in C#.NET or JAVA. Maybe a RSS feed functionality could be inclueded for easy access to the statistics information.

    I hope that some of you have been involved, perhaps even developed, a statistic system of some kind that StreamServe projects are delivering data to and would like to share your knowledge and tips/tricks, please.

     

    /Jesper

    Monday 09 August, 2010
  • David Shih David Shih StreamServe Employee
    3 likes

    One option would be to try to use the Usage Statistics Reporter ("USR" or "Communication Reporter"). As of SP4R2, we're still using the Poet FastObjects ("StreamServe Object Store") Querying data directly from the USR Repository would be completely and totally unsupported. But you could try editing the template project to generate an XML Consolidation Report when some other application calls for it. And then that other application could parse the XML.

    Since the usage reporting is built into the StreamServer, it'll run faster, and with less overhead than anything you could script. It may not have all the functionality you want, though. (i.e. info about Collector, adjustable log level/severity filter, ability to log custom fields [like doctypes and metadata], log custom text fields...). Put together a list of your requirements, and post an Enhancement Suggestion on the PM site.

     

    Another option would be to use the database logging introduced in SP4. In Control Center, you go to Log Configuration, enable "Database logging" and set the log level. (Refer to each application's logmanager.xml file.) Such logging happens on a per-territory basis, so the log files are stored in the Runtime Repository. You could use direct SQL calls, or Web Services calls through Service Gateway. Unfortunately, I don't have any documentation for doing this.

    This approach requires the least amount of scripting, wouldn't slow down the core Strs processing as much as a fully custom solution, keeps your projects from getting too weird, uses standard logging (so you don't have to keep track of two separate sets of log messages). The existing log() scripting command is easy to use and understand; you've been using it for years. Default queries against the database log can be performed with date (including year), time (down to milliseconds), Job ID, Thread ID, log level, error code, error texts... Insert your own texts as you see fit. Instead of having to create your own framework, you could just focus on the queries. If you can get some more details from R&D about the necessary Web Services calls or database tables to hit, this would be the best option. And post your thoughts and enhancement suggestions to PM.

     

    Your last option would be to somewhat reinvent the wheel. You'd want

    • a usage client to run on each StreamServer instance. Ideally, this would be as small as possible. I'd write a small function file, which contains all the environment variables and connection parameters, and the function just does an HTTP POST. HTTP is relatively lighweight, and allows you to keep the client as simple as possible.
    • a usage server to collect all the data from all the StreamServer instances. This would probably be a multithreaded StreamServer instance with an HTTP IN connector. It would store centrally to a database of some sort. If you didn't want to have this component, you could integrate the logging to database functionality in your usage client. But then each client would have to write directly to the database. And that increases your projects' complexity, and would make them run a LOT slower.
    • a usage repository. I'd use a relational database. You could use the filesystem, but queries are easier with a database. Besides, with a database, you don't have to worry about concurrent writes.
    • a reporting server. This would probably be a custom web app which runs queries against the database server. The customer could write this in Java, Ruby, PHP...whatever.
    • a reporting client. Probably a web browser.
    This last option would require the most work. But you would have greater control over what gets logged, and how the data structures look. This could be a good thing, or a very bad thing.
    Monday 09 August, 2010
  • Vyv Lomax Vyv Lomax Administrator
    0 likes

    Hi Jesper, your discussion points very much to the units of input files / events / processes and output files which is very much covered by the Communication Reporter, although I could easily imagine that you want to get in under the hood in order to expand it to your liking.

    If we call this kind of reporting 'StreamServe Technical Reporting' then there should also be a consideration to incorporate some form of 'Business Document Reporting'. e.g. Number of documents issued by ERP/CRM/Legacy system contra number processed by StreamServe. Number of invoices, their types, their sums, their coverages etc. Number of Order documents, EDI Orders, ASN's etc. It gets tricky of course when there are Post Processing activities involved. Number of inserts / envelopes / accounting when co-enveloping per division / customer etc. It is quite a complex relationship that needs to be transparent. (At least to some degree).

    I believe that there should be provision of this from within the product as larger production runs require thorough reporting and exception reporting / warnings and the like from both a technical level and a business level - fully customizable.

    I have had some simple discussions with collegues about having a real time network diagram of all processing activities - but again with support from the product out of the box. Using visual network diagrams to develop Post Processing runs is really very tempting!

    Moving along to another possibility is to forget the real time aspect and go with a batch approach. To easily combine all technical and business reporting you could always log heavily all of the details that you require into the log file. The log file should then be parsed by an application that can consolidate / sort ID's and generate a complete picture for you. This can also be done by a simple PC and may not have to affect the StreamServe server. Your projects could use a series of function files and you should just have to include them in a project template. Just fill in the ID:s.

    I would like to know why a real time system is so inportant? Updating other systems? Planning capacity for outsourcing?

    Designing the system not to behave funny can be done in the design environment - maybe it is too late to see it screw up in a production environment?

    Please let me know what you think.

    //557

    Tuesday 10 August, 2010
  • Stefan Cohen Stefan Cohen StreamServe Employee Administrator
    0 likes

    A very interesting subject.

    David's reply covers most aspects of this question. Here are a few aditional things to consider:

    Alt1:

    The IncProcStatCounter() function can be used to create custom counters in the USR.

    Alt2:

    The database tables for the logs has been designed for performance and there is no end-user documentation available yet, so creating queries can be tricky. I think you will need assistance from R&D to sort that out as of now but this alternative should get you a good start.


    Alt3:

    Asynchronously parsing the standard log file (or a custom log file (fileopen, filewriteln, etc...)) using a custom usage client is another option.

    Consider writing the usage client as a small executable and call it using the execute() function (available from 5.5)

    For better performance, agregate the statistics in batches and submit it to the usage server late (jobend) or asynchronously.

     

    I think the local script vars in 5.5 will help you acheive good portability of your solution regardless of what alternative you select.

     

    Regards,

    Stefan

    Tuesday 10 August, 2010
  • Jesper Werge Jesper Werge StreamServe Employee
    0 likes

    Hi all,

    Thank you for your answers, I really appreciate them :)

    I will look into the Communication reporter and see if i can manage to get the information needed for the requirements.

    Second I will look into buiding the Wheel once more, as this will give me a chance to write the pm suggestion more precisely, however it might be ok proformance wise if you can do this with Standard StreamServe functionalities and without to much scripting :).

    I will get back in this post to write about which approach was selected.

    /Jesper  

    Wednesday 11 August, 2010
  • Robert Smith Robert Smith OpenText Employee StreamServe Employee
    0 likes

    Hi

    This is an old thread but I just wanted to bring up the new feature "java notifications" that exists in SP5 and even more in the latest SP5 EP1. these new features are realy very neat and if/when we make new USR we will probably base it on these notifications. Here you dont need to change the projects either.

    Volodymyr  Ilchenko wrote an interesting article that shows some usage of it on streamshare a while ago,

    Wednesday 26 October, 2011
  • Stefan Cohen Stefan Cohen StreamServe Employee Administrator
    0 likes

    Here's the link to the article Robert mentioned above: http://streamshare.streamserve.com/Articles/Article/?articleId=436

     

    //Stefan

    Thursday 27 October, 2011
  • Jesper Werge Jesper Werge StreamServe Employee
    0 likes

    Hi Robert and Stefan,

    Thnak you very much for the interesting article and information, looking forward working more with this.

     

    We have build an small database for statistics where we use ODBC as connection, it is actually very fast and has not compromissed the processing time. the service handle 35.000 invoices pr. hour (Linux/Oracle).

    I am looking forward to change the intergration to these notifications.

    /Jesper

    Tuesday 01 November, 2011

 

Latest from the blogs

Read more