Informatica rolls out new integrations for Databricks’ cloud data platform
Informatica Inc. is rolling out new integrations for Databricks Inc.’s cloud data platform that will help joint customers process their business information more efficiently.
The integrations were developed as part of an expanded partnership that the companies announced today. In conjunction, Informatica is launching a technical blueprint designed to help customers build applications based on DBRX, an open-source language model that Databricks released in March.
Information’s flagship offering is a data management platform called IDMC. It reduces the amount of manual work involved in moving data between a company’s internal systems. The software also promises to ease several related tasks, such as combining information from different sources and removing erroneous records.
The first new integration that Informatica introduced for Databricks’ platform today is called Native SQL ETL. It works with Databricks SQL, a data warehouse service included in the platform.
One of the tasks that Informatica’s flagship IDMC software promises to simplify for customers is creating ETL pipelines. Those are automation workflows that manage the process of moving business information between applications. ETL pipelines lend themselves to, among other tasks, loading data from a company’s business applications into its Databricks environment for analysis.
ETL pipelines historically ran on separate infrastructure than the target system, in this case Databricks, into which they load information. Using the new integration that Informatica debuted today, ETL pipelines can carry out their computations directly in Databricks’ platform rather than on separate infrastructure as is the usual practice. This arrangement can help make the data loading possess more efficient.
In conjunction, Informatica is enhancing the integration between its IDMC platform and Databricks’ Unity Catalog. The latter tool allows software teams to access data from their company’s internal systems via a centralized interface, as well as regulate who can access that information.
Informatica says that the integration will improve the data lineage features available to customers. Using those features, a company can examine how a record changed over time to determine whether errors may have found their way into the file. Removing erroneous information from the datasets used by analytics tools helps ensure that those tools produce accurate results.
Alongside the new integrations, Informatica debuted a new technical template dubbed GenAI Solution Blueprint for Databricks DBRX. It’s designed to help companies build AI applications with RAG or retrieval-augmented generation features. RAG allows an AI application to incorporate information from outside its model’s training dataset into prompt responses.
The template has two main components. The first is Informatica’s IDMC platform while the other is DBRX, an open-source large language model that Databricks released earlier this year. The model features 132 billion parameters and was trained on a dataset comprising 12 trillion tokens.
As part of today’s product updates, Informatica is also making its CDI-Free offering available through Partner Connect, a third-party software marketplace built into Databricks’ platform. CDI-Free is a scaled-down version of Informatica’s ETL tool that is available at no charge. ETL pipelines created using the software can process up to 20 million rows of data per month.
Image: Informatica
A message from John Furrier, co-founder of SiliconANGLE:
Your vote of support is important to us and it helps us keep the content FREE.
One click below supports our mission to provide free, deep, and relevant content.
Join our community on YouTube
Join the community that includes more than 15,000 #CubeAlumni experts, including Amazon.com CEO Andy Jassy, Dell Technologies founder and CEO Michael Dell, Intel CEO Pat Gelsinger, and many more luminaries and experts.
THANK YOU