In a nod to the growing importance of data science and AI development on its platform, Snowflake today unveiled that its upcoming Winter Release will support for executing code written in Python, which is the most popular language in the world and also the number one language for developing machine learning models.
Support for Python is in private preview and is being added to Snowpark, Snowflake’s compute framework for automating computational workflows for data analytics, data science and data engineering use cases. Snowflake launched Snowpark one year ago with support for Java and Scala, giving users a Spark-like capability to kick off workflows with DataFrames. And now it’s adding support for Python DataFramees due to high demand.
“We heard it loud and clear,” said Torsten Grabs, director of product management for Snowflake, on the call of the Python. “Python is the languages of choice for many data scientists and many data engineers.”
Whether it’s PyTorch or scikit-learn, most of the popular Python machine learning frameworks will now be supported on Snowflake, the $118-billion cloud warehousing company that’s giving AWS, Azure, and Google Cloud a run for its data warehousing money.
“What is exciting about it is it essentially brings the whole Python ecosystem to Snowflake, all the libraries and all the packages that the Python community has built,” Grabs says. “We’re welcoming the whole Python community to this the Snowflake data platform.”
The popularity of Python has been building for years, and it recently knocked C off its perch as the number one language in the TIOBE Index. While the data science community certainly has driven a lot of Python’s popularity, it’s usage is also surging among data engineers. That’s just fine with Snowflake, which recorded $592 million in revenue in fiscal 2021 and became headquarterless earlier this year.
“Data scientists [and] advanced analytics are key audiences for us,” Grab says. “But also we’re seeing Python becoming increasingly more popular with data engineers. It’s also very powerful at scripting for data pipelines, for example.”
Users can interact with Python through a number of IDEs and notebooks. For Python, that includes Visual Studio Code and PyCharm, in addition to the Jupyter notebook. For Java and Scala, Snowflake is supporting IntelliJ and Eclipse development environments, Grabs says.
Snowflake’s Python environment comes by way of Anaconda, which maintains packages of open source tools that are often used in data science and analytics environments. Snowflake is leveraging Anaconda’s package manager, called Conda, to help keep the Python environments updated and well-behaved from a dependency point of view, Grabs says.
“Some parts that are really important for us was to make sure that we were providing a well-managed environment where you avoid some of the problems that make Python hard to use,” he says. “That’s the reason why we partnered up with Anaconda, to make the package management and dependency management part easier.”
Snowpark is supporting Python 3.8, with support for more versions of the language planned over time. The company is adopting a DataFrame API for Python, similar to how Spark works. Developers can write a Python DataFrame, and then point that DataFrame at a table in the Snowflake warehouse, and get the results.
Snowflake also supports the capability to register the results of a machine learning training run as a user defined function (UDF), which can be put back into the Snowflake warehouse, where it can be called via SQL. This is part and parcel of Snowflake’s plan to help its customers with analytics as well as machine learning use cases.
“All of that runs on the same compute infrastructure, so we’re not adding a separate product just for Python,” Grabs says. “We’re actually integrating Python into the existing runtime and the compute infrastructure, so that the benefits around scale and performance accrue to your Python workload as much as they would accrue to a SQL-based workload or a Java-based workload. And that then gives you the ability to mix and match and compose across these language boundaries, depending on the user preferences.”
On a data cloud, such as Snowflake’s, the boundaries between what is a data analytics workload versus what is a data science workloads just kind of melt away.
“The boundaries between these silos that we had in the past, let’s say between the data science profession, the data engineering profession, and then the analytics profession–we see those silos become less and less relevant over time,” Grabs says. “So these boundaries we expect to go away. And there are huge benefits to that as well. Through data cloud, you want to get access to all sorts of data and not to limit access to one particular silo…that the data is relevant across different departments, different functions.”
The Winter Release of Snowpark is bringing other goodies to good Snowflake customers everywhere, including a new logging framework, support for processing of unstructured files, and support for stored procedures. These capabilities are primarily available for Scala and Java, with support for Python coming.
Support for stored procedures will give customers the capability to run control flow or driver logic on Snowflake compute rather than running that on a separate VM, Grabs says, while the new logging function will give customers the ability to log custom code.
The unstructured file support will open the door to new types of analytics and ML use cases in Snowpark, such as the capability to enter audio files of call center interactions, Grabs says. “There’s a lot of potential there to leverage data science and machine learning, but they’re also important workloads that operate on structured and semi-structured [data], so it’s not limited so just unstructured data,” he says.
Snowflake executives Benoit Dageville, the co-founder and president of product, and Christian Kleinerman, SVP of product, will be discussing these new features at its Snowday virtual event today. You can sign up for the event at the company’s website.
What’s Driving Python’s Massive Popularity?
Newly ‘Headquarterless’ Snowflake Makes a Flurry of Announcements
Snowflake Extends Its Data Warehouse with Pipelines, Services