XML Parsing

The processing of XML files is slightly different from other files. How it is processed is defined by the format descriptor. The backend process splits the xml source file into a number of entities, defined by the tables in the descriptor. During the load of the file each entity is processed separately, sorted from parent to child, resulting in a different approach than when loading a normal file such as csv). The differences are the following:

  • The resulting table, as defined in the task properties, will in this case not hold the data of the xml file but only list the different entities. For each entity the table name in which the data can be found for that entity is provided. So the data of the XML file will not be stored in the table defined in the task properties but in several different tables.
  • When the different entities are loaded, the Stored Procedure (to process the data) is fired for each entity, directly after the entity has been loaded. If there are n entities, the procedure will be executed n times. However, when an entity is processed, child entities might not yet have been loaded. To identify the different entities while processing, a parameter ‘TEMPFILENAME’ can be defined in the Stored Procedure. This parameter contains the file name (which corresponds to the table name) for the entity that has been loaded and requires processing.
  • The data for each entity will remain in the corresponding tables during the entire load. It will be removed only when the same task is re-executed.