스파크 설치
DB CAFE
- DBA { Oracle DBA 명령어 > DBA 초급 과정 > DBA 고급 과정 }
- 튜닝 { 오라클 튜닝 목록 }
- 모델링 { 데이터 모델링 가이드 }
== Install PySpark using pip: sh Copy code pip install pyspark Set the necessary environment variables. PySpark needs to know the path to your Java installation. For Linux or macOS:
sh Copy code export JAVA_HOME=<path_to_java> export SPARK_HOME=$(python -c "import site; print(site.getsitepackages()[0])")/pyspark export PYSPARK_PYTHON=python3 export PYSPARK_DRIVER_PYTHON=python3 Replace <path_to_java> with the actual path to your Java installation. You can find it using the following command:
sh Copy code /usr/libexec/java_home For Windows:
powershell Copy code $env:JAVA_HOME = "<path_to_java>" $env:SPARK_HOME = (python -c "import site; print(site.getsitepackages()[0])") + "\pyspark" $env:PYSPARK_PYTHON = "python3" $env:PYSPARK_DRIVER_PYTHON = "python3" Replace <path_to_java> with the actual path to your Java installation.
Test your PySpark installation: Create a new Python script or open an interactive Python shell, and run the following code:
python Copy code from pyspark.sql import SparkSession
spark = SparkSession.builder \
.appName("PySpark Test") \ .getOrCreate()
data = [("Alice", 1), ("Bob", 2), ("Cathy", 3)] columns = ["Name", "ID"]
df = spark.createDataFrame(data, columns) df.show() If the installation is successful, you will see the DataFrame displayed as output.
(Optional) Install additional packages: You may need additional packages for your PySpark projects, such as pandas, numpy, or matplotlib. Install them using pip:
sh Copy code pip install pandas numpy matplotlib With PySpark installed, you can now start working on your data processing and machine learning projects.