행위

"스파크 설치"의 두 판 사이의 차이

DB CAFE

(PySpark 설치 (pip 이용))
1번째 줄: 1번째 줄:
 
== PySpark 설치 (pip 이용) ==
 
== PySpark 설치 (pip 이용) ==
sh<source lang=shell>
+
<source lang=shell>
 
pip install pyspark
 
pip install pyspark
 
</source>
 
</source>
 
Set the necessary environment variables. PySpark needs to know the path to your Java installation.
 
Set the necessary environment variables. PySpark needs to know the path to your Java installation.
  
For Linux or macOS:
+
=== Linux or macOS 에서 설치 ===
 
 
 
<source lang=shell>
 
<source lang=shell>
 
export JAVA_HOME=<path_to_java>
 
export JAVA_HOME=<path_to_java>
17번째 줄: 16번째 줄:
 
/usr/libexec/java_home
 
/usr/libexec/java_home
 
</source>
 
</source>
For Windows:
 
  
powershell
+
=== Windows 에서 설치 ===
 
<source lang=shell>
 
<source lang=shell>
 
$env:JAVA_HOME = "<path_to_java>"
 
$env:JAVA_HOME = "<path_to_java>"
26번째 줄: 24번째 줄:
 
$env:PYSPARK_DRIVER_PYTHON = "python3"
 
$env:PYSPARK_DRIVER_PYTHON = "python3"
 
</source>
 
</source>
 
 
Replace <path_to_java> with the actual path to your Java installation.
 
Replace <path_to_java> with the actual path to your Java installation.
  
Test your PySpark installation:
+
=== PySpark 설치 후 테스트 ===
 
Create a new Python script or open an interactive Python shell, and run the following code:
 
Create a new Python script or open an interactive Python shell, and run the following code:
  

2023년 3월 23일 (목) 14:45 판

thumb_up 추천메뉴 바로가기


1 PySpark 설치 (pip 이용)[편집]

pip install pyspark

Set the necessary environment variables. PySpark needs to know the path to your Java installation.

1.1 Linux or macOS 에서 설치[편집]

export JAVA_HOME=<path_to_java>
export SPARK_HOME=$(python -c "import site; print(site.getsitepackages()[0])")/pyspark
export PYSPARK_PYTHON=python3
export PYSPARK_DRIVER_PYTHON=python3

Replace <path_to_java> with the actual path to your Java installation. You can find it using the following command:

/usr/libexec/java_home

1.2 Windows 에서 설치[편집]

$env:JAVA_HOME = "<path_to_java>"
$env:SPARK_HOME = (python -c "import site; print(site.getsitepackages()[0])") + "\pyspark"
$env:PYSPARK_PYTHON = "python3"
$env:PYSPARK_DRIVER_PYTHON = "python3"

Replace <path_to_java> with the actual path to your Java installation.

1.3 PySpark 설치 후 테스트[편집]

Create a new Python script or open an interactive Python shell, and run the following code:

from pyspark.sql import SparkSession

spark = SparkSession.builder \
    .appName("PySpark Test") \
    .getOrCreate()

data = [("Alice", 1), ("Bob", 2), ("Cathy", 3)]
columns = ["Name", "ID"]

df = spark.createDataFrame(data, columns)
df.show()

If the installation is successful, you will see the DataFrame displayed as output.

(Optional) Install additional packages: You may need additional packages for your PySpark projects, such as pandas, numpy, or matplotlib. Install them using pip:

pip install pandas numpy matplotlib

With PySpark installed, you can now start working on your data processing and machine learning projects.