joshita / dev

Published

- 1 min read

Latest updates about Apache Spark

img of Latest updates about Apache Spark

What is Apache Spark Engine

Apache Spark is a unified analytics engine for large-scale data processing. It provides high-level APIs in Java, Scala, Python and R, and an optimized engine that supports general execution graphs You can download it from the downloads section from the official website directly https://spark.apache.org/downloads.html

NOTE: Previous releases of Spark may be affected by security issues. Please consult the Security page for a list of known issues that may affect the version you download before deciding to use it.

Latest spark version : 3.5.2

  1. Its a maintenance release containing security and correctness fixes

  2. There are many bug fixes, some of them like

    • TimestampNTZ type inference on Parquet files
    • Use Java 17 instead of 17-jre image in K8s Dockerfile group by all should be idempotent
    • Memory leak when interrupting shuffle write using zstd compression
    • Fix the data corruption issue when state store unload and snapshotting happens concurrently for HDFS state store
    • CSV parsing failure with char/varchar type columns
  3. There are a bunch of dependency changes as well.

If you want to contribute to spark, please checkout more details in their contribution page https://spark.apache.org/contributing.html

For latest news on Apache Spark checkout their page https://spark.apache.org/news/