Documentation
< All Topics
Print

2.4 Transformer

Momentum provides SQL-based interface for data transformation. Data cleaning, null removal, datatype conversion, column renaming, mathematical transformation, blending, merging, joining with multiple data sources are some of the transformation tasks that can be performed over data created within Momentum. Anything that is supported by ANSI-standard SQL can be performed using Momentum’s transformation engine. A multi-level SQL based transformation is a powerful way of performing complex transformation tasks involving single or multiple data sources. 

In this section we will explore how to perform a single SQL-based as well as multi-step transformations. 

Single Step Transformer 

In the following example, binary formatted date is transformed into human readable date using standard SQL. To do this: 

  1. Form the left hand side navigation panel, expand Transformer” and then click “Transformer Home”. 
  1. Click “New Transformer” located at the top menu option. The transformer window opens. 
  1. The transformer window is divided into two sections: 
  1. The left side section containing a form is used to write transformation SQL, and 
  1. the right-side section shows all data created within Momentum by various components. Clicking on any of the data sources will display 100 rows of data by default.  When the data is displayed, it shows a query block at the top where you can write any SQL statements on top of this datasource to test out any potential transformation query. This view is provided to look at the data while writing the transformation query on the left side section.  See Figure 2.5 below for an example. 

Figure 2.5: View showing 100 records of data to help in writing transformation SQL on the left section 

  1. Notice that the DATE column in the above screenshot is in binary format. We will write SQL to transform that column to a human readable date format. Fill out the form on the left section to configure the transformer as described below (and shown in Figure 2.6 below) 

Figure 2.5: Transformer example showing SQL statement 

  1. Name: give a unique and meaningful name to this transformer 
  1. Transformer Query: Write SQL statement to transform your data. Notice that the table name contains the first two letters of the components that generated data, followed by a dot, followed by the username and another dot, and the name of the component. For example, our datasource table name is “io.sansari.machine_data_ingester” where “io” stands for ingester output”. 
  1. In our example, the SQL statements to convert binary DATE into human readable date is:  

SELECT *, from_unixtime(to_unix_timestamp(`DATE`, ‘yyyy-MM-dd’),’yyyy-MM-dd HH:mm:ss’)  as TRANSFORMED_DATE from  io.sansari.machine_data_ingester 

  1. Select output format, parquet being the default. 
  1. Max core: specify how many CPU core of the cluster your transformer should run on concurrently to perform parallel operations. For example, 4 cores of CPU is good enough for small to mid-size data. For larger dataset, the more that core, the faster the processing will be. 
  1. RAM: specify how much RAM each CPU core should occupy. 4GB per core is a good default for most cases and should be larger for very large dataset. 
  1. Submit to save the transformer configuration. If everything goes well, the page will transition to the Transformer Home page. 
  1. From the Transformer Home page, check the transformer just created, and click the “Run” button located at the top menu bar. 
  1. It might take a few seconds to provision and start the transformation process, and depending on the data size, it make take some time to complete the transformation. 
  1. Click on the “Transformer” menu option from the top menu to refresh the transformer status, shown below in Figure 2.6. 

Figure 2.6: Screen showing the transformation run status. 

  1. Click on the Logs link to monitor the logs and watch for any errors. 
  1. After the transformer is completed successfully, checkbox, and click on “View Data” to take a quick look of the data created by the transformation (see Figure 2.7 below and notice the TRANFORMED_DATE column has human readable dates). See more on data exploration, “Exploring Data”, below.  

Figure 2.7: A sample data generated by the transformer 

Multi Step Transformer 

To demonstrate multi-step transformation process, we will work on an example that creates training and test sets needed for machine learning model training. The steps are as follows. 

Creating Training and Test Sets Using Transformer 

There are many ways to create training and test sets from the dataset we ingested. This section demonstrates how to use SQL compliant transformer to create the two sets for machine learning training and test. 

Assuming we want to create 80% training and 20% test sets from the original 10,000 records. We also want to randomize the data so that the training and test sets are not biased. We will create two transformers: 

  • The training set transformer will have a randomized 8000 records 
  • The test set transformer will have 2000 records that are not in the training set. 

Creating Training Set 

Here are the steps to create the training set: 

  1. Expand “Transformer” and click “Transformer Home” to launch the transformer home page. 
  1. Click “New Transformer” from the top menu to open the form to write transformation SQL. Provide a meaningful name to this transformer, e.g. machine_data_training_set 
  1. In the Transformer Query block, write SQL statement as follows. Notice that the SQL statement contains the ingester name or a previously transformed dataset name. 
  1. Save the Transformer. 
  1. Select the transformer and click “Run” located at the top menu bar. 
  1. After the transformer is successfully done, use the data exploration and interactive query tools to check the data. 

SELECT * FROM tr.sansari.machine_data_transformer order by RAND() limit 8000 

Listing 2: Transformer SQL to generate training set 

Figure 2.8: Transformer to create training set 

Notice that the transformer page shows a list of data sources on the right-side panel. You can click on any data item to see the data in tabular form. This view is provided to give an aid to the user while writing the SQL statements for the transformer. 

Creating Test Set 

This section demonstrates a multi-step transformation process, a powerful way of transforming complex data into meaningful forms. Here are the steps to create the test set: 

  1. Click New Transformer at the top menu bar. Give a meaningful name to the transformer, such as machine_data_test_set 
  1. Write a simple “Select” statement to load the data from the original data source, e.g. machine_data_transformer. Logically, the output of this statement will be stored in a temp table called “S1”. 

SELECT * FROM tr.sansari.machine_data_transformer 

Listing 3: SQL statement to load all the data from a transformer 

  1. Click “Add SQL Query” to open another query block. We will write a simple SQL statement to load the data from the training set transformer that we created previously. The output of this SQL is logically stored in another temp table called “S2”. 

SELECT * FROM tr.sansari.machine_data_training_set 

Listing 4: SQL statement to load transformer data that created training set 

  1. The final SQL statement will simply perform a MINUS operation between the two SQL statements above.  

select * from s1 minus select * from s2 

Listing 5: SQL statement to perform MINUS operation 

The following Figure 2.9 shows the three SQL statements as you will see in Momentum Transformer page. 

Figure 2.9: Multi-step transformer to create test set 

  1. Save and run this transformer. Check the result from the Data Upload and Exploration utilities. 

Notes: 

  1. If the column name is a SQL reserved keyword, such as DATE, backquote that column, e.g. `DATE` within the SQL statements. 
  1. The transformer stores data of the last run only. In other words, if you run the transformer multiple times, only the last run output will be stored in transformer.  
  1. Add the transformer in a data pipeline if you want incremental data or full data as a result of the transformation. See “Setting Data Pipeline” section. 
Table of Contents

Lester Firstenberger

Lester is recognized nationally as a regulatory attorney and expert in consumer finance, securitization, mortgage, and banking law.

Lester is recognized nationally as a regulatory attorney and expert in consumer finance, securitization, mortgage, and banking law. In a variety of capacities, over the past 30 years as an attorney, Mr. Firstenberger has represented the interests of numerous financial institutions in transactions valued in excess of one trillion dollars. He was appointed to and served a three-year term as a member of the Consumer Advisory Council of the Board of Governors of the Federal Reserve System. He has extensive governmental relations experience in the US and Canada at both the federal and state and provincial levels.

Shamshad (Sam) Ansari is an author, inventor, and thought leader in the fields of computer vision, machine learning, artificial intelligence, and cognitive science. He has extensive experience in high scale, distributed, and parallel computing. Sam currently serves as an Adjunct Professor at George Mason University, teaching graduate- level programs within the Data Analytics Engineering department of the Volgenau School of Engineering. His areas of instruction encompass machine learning, natural language processing, and computer vision, where he imparts his knowledge and expertise to aspiring professionals.

Having authored multiple publications on topics such as machine learning, RFID, and high-scale enterprise computing, Sam’s contributions extend beyond academia. Sam’s book, titled “Building Computer Vision Applications Using Artificial Neural Networks,” has garnered acclaim with two published editions. It received recognition as one of the top 10 books ever written on this subject by bookauthority.org, highlighting the significant impact and quality of Sam’s contributions to the field. He holds four US patents related to healthcare AI, showcasing his innovative mindset and practical application of technology.

Throughout his extensive 20+ years of experience in enterprise software development, Sam has been involved with several tech startups and early-stage companies. He has played pivotal roles in building and expanding tech teams from the ground up, contributing to their eventual acquisition by larger organizations. At the beginning of his career, he worked with esteemed institutions such as the US Department of Defense (DOD) and IBM, honing his skills and knowledge in the industry.

Currently, Sam serves as the President and CEO of Accure, Inc., an AI company that he founded. He is the creator, architect, and a significant contributor to Momentum AI, a no-code platform that encompasses data engineering, machine learning, AI, MLOps, data warehousing, and business intelligence. Throughout his career, Sam has made notable contributions in various domains including healthcare, retail, supply chain, banking and finance, and manufacturing. Demonstrating his leadership skills, he has successfully managed teams of software engineers, data scientists, and DevSecOps professionals, leading them to deliver exceptional results. Sam earned his bachelor’s degree in engineering from Birsa Institute of Technology (BIT) Sindri and subsequently a Master’s degree from the prestigious Indian Institute of Information Technology and Management Kerala (IIITM-K).

Moghisuddin Raza

Mogishuddin Raza is a technology leader. As the COO of Accure he is having global product delivery responsibility along with overall strategic and operational responsibility.

Mogishuddin Raza is a technology leader. As the COO of Accure he is having global product delivery responsibility along with overall strategic and operational responsibility.

Having extensive background in technology product development and integration, in particular to Enterprise storage, virtualization, cloud computing, high availability & business continuity technology/solutions, and Big Data & related technologies. Has been passionate and evangelizing the usage of Big data technologies using Momentum to implement advanced analytics (descriptive and predictive) to directly impact the business via an intuitive set of use cases.

Having approximately two decades of experience in high-tech industries which includes big MNCs corporate like EMC Corp and Hewlett-Packard to mid-size organization such as Netkraft, Trados Inc driving transformation in strategizing, planning and architecting product engineering, execution and delivery of high quality products releases within budget & time.

Skilled in all aspects of big MNCs as well as company startups and growth including: strategizing, business planning, market research, finance, product development and profit margins & revenue management. Excellent leadership and people motivation skills. Expert in managing cross-functional, cross cultural global team and building strategic partnership in the global virtual matrix team environment.

Overall, a senior software business professional, skilled in the management of people, resources and partnerships which enables building an eco system for a winning organization.



Rajesh Kumar Nedungadi

Scion of A Former Royal House of Kerala, India President Garuttman Group, USA. Rajesh is an entrepreneur & visionary specializing in International Business Strategy and Market Development with focus on Middle East & North America. With over 20 years’ experience in international trade, Rajesh is an expert on Business Strategy Development, Market Opportunity Development and International Sales & Marketing of Products and Services including the IT Industry. Rajesh is working as Managing Partner / Board Member of many companies including, Globistic Company USA, Castlewick Companies, USA.

Former Gartner Analyst having authored or co-authored over 280 research notes, on emerging technologies like AI, SD-WAN, 5G, mobile video, cloud CDN, IoT, SASE in Cybersecurity, 6G, etc. in the past decade. A frequent speaker at tech events, he is often quoted in leading institutions like CNN, Wall St. Journal, etc. He is a former CTO of one of the first video/WiFi smartphone firms. Currently also a Cybersecurity Advisor at Lionfish Tech Advisors, and on the advisory board for a few startups.

 Sharma helped contribute to fly-by-wire standards used in avionics and holds an Engineering degree in Computer Systems Engineering from Carleton University in Canada and completed graduate coursework in AI/ML from there.

His Engineering Thesis was on the application of Hopfield’s Neural Networks applied to Combinatorial Optimization.

Mike joined Accure as the Senior Vice President of Sales after leaving MarkLogic where he worked for 17 years and held several executive sales positions including his last role leading the global sales strategy. Prior to MarkLogic, Mike spent 10 years at Autonomy as VP of sales for their Business Process Management practice. He began his career in database technology with Informix in the Chicago area where he led Channel Sales for Central US.

When not driving Accure’s sales, Mike spends time with his lovely wife, Meg, two daughters, Leigh Rose and Georgia, and his puppy Kalua (“Lulu”).