A Brief Overview of PySpark Logging

1 minute read

Published:

This article is about a brief overview of how to write log messages using PySpark logging.

Log Properties Configuration

I. Go to the conf folder located in PySpark directory.

$ cd spark-2.4.0-bin-hadoop2.7/conf

II. Modify the log4j.properties.template by appending these lines:

# Define the root logger with Appender file
log4j.rootLogger=WARN, console

# Define the file appender
log4j.appender.FILE=org.apache.log4j.DailyRollingFileAppender

# Name of the log file
log4j.appender.FILE.File=/tmp/logfile.out

# Set immediate flush to true
log4j.appender.FILE.ImmediateFlush=true 

# Set the threshold to DEBUG mode
log4j.appender.FILE.Threshold=debug

# Set File append to true
log4j.appender.FILE.Append=true

# Set the Default Date pattern
log4j.appender.FILE.DatePattern='.' yyyy-MM-dd

# Default layout for the appender
log4j.appender.FILE.layout=org.apache.log4j.PatternLayout
log4j.appender.FILE.layout.conversionPattern=%m%n

Then, save the configuration file (log4j.properties.template) and give it a new name: log4j.properties

You can adjust the configuration file in accordance with your specific needs. Please visit log4j documentation for more information. However, the above configuration is quiet enough for a simple logging task.


Use Logging in Your PySpark Code

Execute this simple code:

from pyspark import SparkContext, SparkConf

conf = SparkConf().setAppName("pyspark-logging")
sc = SparkContext(conf=conf)
log4jLogger = sc._jvm.org.apache.log4j

log = log4jLogger.LogManager.getLogger(__name__)
log.trace("Trace Message!")
log.debug("Debug Message!")
log.info("Info Message!")
log.warn("Warn Message!")
log.error("Error Message!")
log.fatal("Fatal Message!")

If you want to write the log messages into a file, you can modify these property config lines:

log4j.rootLogger=WARN, FILE
log4j.appender.FILE.File=path_to_the_log_file

FYI, you can use any name for the appender. In this case, we’re using ‘FILE’ as the appender’s name.