행위

"스파크 설치"의 두 판 사이의 차이

DB CAFE

(PySpark 설치 (pip 이용))
1번째 줄: 1번째 줄:
== PySpark 설치 (pip 이용) ==
+
== 설치에 필요한 파일 다운로드 ==
 +
# 스파크_하둡 https://spark.apache.org/downloads.html
 +
# JAVA JDK
 +
# WINUTILS https://github.com/cdarlint/winutils
 +
=== PySpark 설치 (pip 이용) ===
 
<source lang=shell>
 
<source lang=shell>
 
pip install pyspark
 
pip install pyspark

2023년 3월 23일 (목) 16:05 판

thumb_up 추천메뉴 바로가기


1 설치에 필요한 파일 다운로드[편집]

  1. 스파크_하둡 https://spark.apache.org/downloads.html
  2. JAVA JDK
  3. WINUTILS https://github.com/cdarlint/winutils

1.1 PySpark 설치 (pip 이용)[편집]

pip install pyspark

Set the necessary environment variables. PySpark needs to know the path to your Java installation.

1.2 Linux or macOS 에서 설치[편집]

export JAVA_HOME=<path_to_java>
export SPARK_HOME=$(python -c "import site; print(site.getsitepackages()[0])")/pyspark
export PYSPARK_PYTHON=python3
export PYSPARK_DRIVER_PYTHON=python3

Replace <path_to_java> with the actual path to your Java installation. You can find it using the following command:

/usr/libexec/java_home

1.3 Windows 에서 설치[편집]

$env:JAVA_HOME = "<path_to_java>"
$env:SPARK_HOME = (python -c "import site; print(site.getsitepackages()[0])") + "\pyspark"
$env:PYSPARK_PYTHON = "python3"
$env:PYSPARK_DRIVER_PYTHON = "python3"

Replace <path_to_java> with the actual path to your Java installation.

1.4 PySpark 설치 후 테스트[편집]

Create a new Python script or open an interactive Python shell, and run the following code:

from pyspark.sql import SparkSession

spark = SparkSession.builder \
    .appName("PySpark Test") \
    .getOrCreate()

data = [("Alice", 1), ("Bob", 2), ("Cathy", 3)]
columns = ["Name", "ID"]

df = spark.createDataFrame(data, columns)
df.show()

If the installation is successful, you will see the DataFrame displayed as output.

(Optional) Install additional packages: You may need additional packages for your PySpark projects, such as pandas, numpy, or matplotlib. Install them using pip:

pip install pandas numpy matplotlib

With PySpark installed, you can now start working on your data processing and machine learning projects.