Thursday 3 May 2018

Apache Nifi - IV

This post continues from the last one that we wrote on Apache Nifi where we ran a simple flow comprising just three steps. For all the work in this post, we will be using the latest version of Apache Nifi, 1.6.0

In this post, we continue to explore Apache Nifi with a focus on transforming formats of files. While in the earlier post, we merely migrated files from one folder to another, in the first flow that we take a look at, we will add one additional step that will perform the format transformation from a CSV file to a JSON file. More details on the JSON format can be seen here. The flow is shown below:





















Note that the GetFile and PutFile processors are same as in the last post. The additional processor is called ConvertRecord. The properties of GetFile are shown below:
















All default values are taken and only the Input Directory is set. For the ConvertRecord Processor, the properties are shown below:
















Only two properties values are set: CSVReader and JsonRecordSetWriter. But, still we have warnings as shown below:


















Before we enable the Controller Services, we need to set a few properties on the CSVReader and JsonRecordSetWriter Controller Services. Clicking on the arrow next to CSVReader as shown below takes us to the Controller Services tab under NiFi Flow Configuration:






















Note that both services are disabled. Click on the Configure wheel shown above. On the properties shown, change only the Treat First Line as Header property to true as shown below:
















Similarly, for the properties on JsonRecordSetWriter Controller Services, set set Schema Write Strategy property to Do Not Write Schema as shown below:
















After these properties are set, enable the Controller Services on the NiFi Flow Configuration by clicking on the enable icon as shown below:






Once both the the Controller Services are enabled, the flow is ready for being run as shown below:

















We are using a .csv file to run the flow. We can see below the status after a successful run:


















Right clicking on ConvertRecord processor to check the data provenance allows us to see the transformation of format from CSV to JSON.
















Click on View button on left to see the input. The input is shown below:









Click on View button on left to see the output of this processor. The output is shown below:









Click on formatted in View as: dropdown to view the data in JSON format as shown below:




















We can see that the data has been transformed from CSV format to JSON format. With this, we conclude this post