Process querying studies methods for automated management of real-world and envisioned processes, process models, process repositories, and process knowledge within modern organizations. To this end, process querying applies and contributes results in theoretical computer science fundamentals (e.g., distributed and parallel computing, model checking, and formal methods), software engineering, information systems, programming languages, workflow management, and business process management.

PQL Utilities

PQL has been implemented and is publicly available under the open source GNU Lesser General Public License. You can access the source code here. The implementation exhibits a well-defined application programming interface (API) to facilitate its integration with other software products. This API can be accessed by users via command-line interfaces (CLIs) of two utilities: the PQL bot and the PQL tool. These two utilities enable the PQL environment. The PQL bot is used to prepare models for querying. It constructs indexes of models’ behavioral relations. The PQL tool should be used to execute PQL queries over the indexed models. The latest versions of the utilities are available here.

The PQL Bot

The PQL bot is a standalone utility that can be used to systematically index models stored by the PQL tool. Once a model is indexed, it can be matched to a query using the PQL tool. One can start multiple PQL bot instances simultaneously to index several models in parallel.

A call to the PQL indexing routine takes a workflow system described in the Petri Net Markup Language (PNML) format as input. The PNML format is an XML-based syntax for high-level Petri nets, which has been designed as a standard interchange format aimed at enabling Petri net tools to exchange Petri net models. For many high-level process modeling languages, such as WS-BPEL, EPC, and BPMN, there exist mappings to the Petri net formalism. As a result, the PQL environment can work with models developed using a wide range of modeling tools captured using many main stream notations.

Table 1: CLI options of the PQL bot

OptionOption (short)ParameterDescription
--help-hPrint help message
--index -i[number]Maximal indexing time (in seconds)
--name-n[string]Name of this bot (maximum 36 characters)
--sleep-s[number]Time to sleep between indexing jobs (in seconds)
--version-vGet version of this bot

When initializing a PQL bot instance, one can configure it via CLI. Options of the PQL bot CLI are listed in Table 1. Every PQL bot has a unique name, which can be assigned using option -n. If this option is not used, a random unique name is assigned. Once started, a PQL bot instance indexes stored (but not yet indexed) models in succession. One can use CLI options -s and -i to specify time to sleep, i.e., to stay idle, between two successive indexing tasks, and the maximal time to attempt indexing of a model. If these options are not used, the parameters get configured based on the values in the configuration file. If indexing of a model could not be completed within the given time frame, the model is marked as not being able to be indexed using this version of the bot, and the bot proceeds with indexing the next model. The -h and -v CLI options of the PQL bot can be used to print the help message and to get the version of the invoked PQL bot instance, respectively.

Once started, the PQL bot runs as a background process until it is shut down. An example of the command line output of a PQL bot instance is proposed below.

>> java -jar PQL.BOT-1.0.jar -n=Brisbane -s=60 -i=86400
>> =======================================================================
>> Process Query Language (PQL) Bot ver. 1.0
>> =======================================================================
>> Name:               Brisbane
>> Sleep time:         60s
>> Max. index time:    86400s
>> =======================================================================
>> 10:45:18.487 Brisbane - There are no pending jobs
>> 10:45:18.487 Brisbane - Sent an alive message
>> 10:45:18.497 Brisbane - Going to sleep for 60 seconds
>> 10:46:18.505 Brisbane - Woke up
>> 10:46:18.525 Brisbane - Retrieved indexing job for the model with ID 1
>> 10:46:18.575 Brisbane - Start checking model with ID 1
>> 10:46:23.506 Brisbane - Finished checking model with ID 1
>> 10:46:23.506 Brisbane - Start indexing model with ID 1
>> 10:47:03.608 Brisbane - Finished indexing model with ID 1
>> 10:47:03.608 Brisbane - Going to sleep for 60 seconds
>> 10:48:03.613 Brisbane - Woke up
>> 10:48:03.623 Brisbane - Retrieved indexing job for the model with ID 2
>> 10:48:03.673 Brisbane - Start checking model with ID 2
>> 10:48:13.248 Brisbane - Finished checking model with ID 2
>> 10:48:13.249 Brisbane - Start indexing model with ID 2
>> 10:49:52.679 Brisbane - Finished indexing model with ID 2
>> 10:49:52.679 Brisbane - Going to sleep for 60 seconds
>> 10:50:52.704 Brisbane - Woke up
>> 10:50:52.704 Brisbane - There are no pending jobs
>> ...

The PQL Tool

The PQL tool can be used to store, index, delete, and query process models. Table 2 lists CLI options of the PQL tool. The tool allows a user to store a given model (option -s), check if a model can be indexed (option -c), index a model (option -i), delete a model and its index (option -d), visualize the parse tree of a given query (option -p), execute a query (options -q), and reset the PQL environment (option -r).  The CLI can be used to access help information (option -h) and information about the version of the tool (option -v).

Table 2: CLI options of the PQL tool

OptionOption (short)ParameterDescriptionRequires option
--check-cCheck if model can be indexed -id
--delete-dDelete model (and its index)-id
--help-hPrint help message
--index-iIndex model-id
--identifier-id[string]Model identifier-id
--parse-pShow PQL query parse tree -pql
--pnmlPath-pnml[path]Path to a PNML file of a forlder with PNML files
--pqlPath-pql[path]Path to a file with a PQL query
--query -qExecute PQL query -pql
--reset-rReset this PQL instance
--store -sStore model-pnml (-id)
--version -vGet version of this tool

Once a fresh PQL environment is deployed, a user may start using it by storing process models. To store models, the CLI option -s must be accompanied by option -pnml that specifies a path either to a single PNML file or to a directory that contains PNML files. If a path to a PNML file is used, the call to the PQL tool must include option -id to specify a unique identifier to associate with the model. Otherwise, models are attempted to be stored using their file names as unique identifiers. Once stored, a model can be indexed by a PQL bot instance or by the PQL tool using the CLI option -i accompanied by option -id that specifies the unique identifier that was used to store the model. When indexing a model, the PQL tool uses the same routines as the PQL bot.

Note that the dynamic semantics of the first edition of PQL is implemented over sound workflow systems–a special class of Petri net systems. One can check whether a given Petri net system is a sound workflow system by calling the PQL tool with option -c. Note also that every request to index a model in the PQL environment is automatically preceded by a soundness check of this model. Alternatively, a user may delete a model using option -d. By deleting a model, the user also deletes its index. Both options -c and -d require option -id to uniquely identify a model to be checked and deleted, respectively. To execute a PQL query, a user can use option -q together with option -pql that specifies a path to a file that contains a query captured using the grammar of PQL. To visualize the parse tree of a PQL query, one can use option -p together with option -pql. Finally, one can reset the PQL environment using option -r. By resetting the environment, one deletes all stored models and indexes.

The PQL tool supports multi-threaded querying. A user can configure the number of query threads in the tool’s configuration file. As a response to execute a PQL query, the tool returns a collection of matching models and activity labels that were used to retrieve the models. An example command line output of executing the PQL query using the PQL tool is shown below.

>> java -jar PQL.TOOL-1.0.jar -q -pql=query.pql
>> PQL query:  SELECT * FROM * WHERE AlwaysOccurs(~"process payment");
>> Attributes: [UNIVERSE]
>> Locations:  [UNIVERSE]
>> Task:       "process payment"[0.75] -> ["process payment by cash",
>>             "process payment by check"]
>> Result:     [2]

Sample Configuration File

The PQL bot and the PQL tool load their initial configuration parameters from the same PQL.ini configuratoion file. The sample PQL.ini file is proposed below.

url = jdbc:mysql://localhost:3306/pql
user = root
password = password

host = localhost
name = themis
user = user
password = password

lolaPath = .\\lola2\\win\\lola.exe

labelSimilaritySearch = lucene
labelSimilarityConfig = ./lucene/
defaultLabelSimilarityThreshold = 0.75
indexedLabelSimilarityThresholds = 0.5,0.75,1.0
numberOfQueryThreads = 4

defaultBotSleepTime = 5
defaultBotMaxIndexTime = 86400

To improve execution times of PQL queries, the PQL tool relies on an index of behavioral relations–a special data structure that improves the computation speed of behavioral relations at the cost of time for its construction and space for its storage. The tool uses this index at runtime to avoid having to freshly compute PQL predicates every time a new query is issued. The index is stored in a MySQL relational database system (RDBMS), which is one of the most widely used open source database systems. The user needs to configure connection parameters to the PQL MySQL schema in the [mysql] section of the configuration file. The latest version of the PQL MySQL schema can be obtained here.

The user can configure PQL environment to use one of the three integrated information retrieval engines for scoring PQL label similarities. These are Apache Lucene, Themis-IR, and an implementation of the label similarity scoring approach based on the Levenshtein distance. Note that the Apache Lucene and Themis-IR engines use the vector space model to perform label similarity assessments. The sample configuration file listed above is configured to use Apache Lucene. It is also the default option of the PQL tool. The configuration file specifies that the default label similarity threshold to be used when assessing similar activity labels is set to 0.75, i.e., a pql string ~”process payment” refers to all activity labels that have a similarity score (according to Apache Lucene) with “process payment” of at least 0.75. If Themis-IR is used as a label similarity engine, the user needs to configure PostgreSQL connection parameters to an instance of the Themis-IR engine. Finally, the configuration parameter numberOfQueryThreads can be used to configure the number of threads to use when computing PQL queries.

When indexing models, the PQL bot uses the solutions to the reachability and covering problems implemented in the LoLA tool ver. 2.0. The can configure a path to the LoLA tool using the lolaPath parameter of the [lola] section of the configuration file.

Finally, the user can configure the PQL bot parameters using the [bot] section of the configuration file. These are time to sleep between two consecutive indexing jobs and the maximal time to attempt indexing a model (both in seconds).