행위

"스파크 설치"의 두 판 사이의 차이

DB CAFE

(새 문서: == Install PySpark using pip: sh Copy code pip install pyspark Set the necessary environment variables. PySpark needs to know the path to your Java installation. For Linux or macOS:...)
 
1번째 줄: 1번째 줄:
== Install PySpark using pip:
+
== PySpark 설치 (pip 이용) ==
sh
+
sh<source lang=shell>
Copy code
 
 
pip install pyspark
 
pip install pyspark
 +
</source>
 
Set the necessary environment variables. PySpark needs to know the path to your Java installation.
 
Set the necessary environment variables. PySpark needs to know the path to your Java installation.
 +
 
For Linux or macOS:
 
For Linux or macOS:
  
sh
+
<source lang=shell>
Copy code
 
 
export JAVA_HOME=<path_to_java>
 
export JAVA_HOME=<path_to_java>
 
export SPARK_HOME=$(python -c "import site; print(site.getsitepackages()[0])")/pyspark
 
export SPARK_HOME=$(python -c "import site; print(site.getsitepackages()[0])")/pyspark
 
export PYSPARK_PYTHON=python3
 
export PYSPARK_PYTHON=python3
 
export PYSPARK_DRIVER_PYTHON=python3
 
export PYSPARK_DRIVER_PYTHON=python3
 +
</source>
 
Replace <path_to_java> with the actual path to your Java installation. You can find it using the following command:
 
Replace <path_to_java> with the actual path to your Java installation. You can find it using the following command:
 
+
<source lang=shell>
sh
 
Copy code
 
 
/usr/libexec/java_home
 
/usr/libexec/java_home
 +
</source>
 
For Windows:
 
For Windows:
  
 
powershell
 
powershell
Copy code
+
<source lang=shell>
 
$env:JAVA_HOME = "<path_to_java>"
 
$env:JAVA_HOME = "<path_to_java>"
 
$env:SPARK_HOME = (python -c "import site; print(site.getsitepackages()[0])") + "\pyspark"
 
$env:SPARK_HOME = (python -c "import site; print(site.getsitepackages()[0])") + "\pyspark"
 
$env:PYSPARK_PYTHON = "python3"
 
$env:PYSPARK_PYTHON = "python3"
 
$env:PYSPARK_DRIVER_PYTHON = "python3"
 
$env:PYSPARK_DRIVER_PYTHON = "python3"
 +
</source>
 +
 
Replace <path_to_java> with the actual path to your Java installation.
 
Replace <path_to_java> with the actual path to your Java installation.
  
30번째 줄: 32번째 줄:
 
Create a new Python script or open an interactive Python shell, and run the following code:
 
Create a new Python script or open an interactive Python shell, and run the following code:
  
python
+
<source lang=python>
Copy code
 
 
from pyspark.sql import SparkSession
 
from pyspark.sql import SparkSession
  
43번째 줄: 44번째 줄:
 
df = spark.createDataFrame(data, columns)
 
df = spark.createDataFrame(data, columns)
 
df.show()
 
df.show()
 +
</source>
 +
 
If the installation is successful, you will see the DataFrame displayed as output.
 
If the installation is successful, you will see the DataFrame displayed as output.
  
48번째 줄: 51번째 줄:
 
You may need additional packages for your PySpark projects, such as pandas, numpy, or matplotlib. Install them using pip:
 
You may need additional packages for your PySpark projects, such as pandas, numpy, or matplotlib. Install them using pip:
  
sh
+
<source lang=shell>
Copy code
 
 
pip install pandas numpy matplotlib
 
pip install pandas numpy matplotlib
 +
</source>
 
With PySpark installed, you can now start working on your data processing and machine learning projects.
 
With PySpark installed, you can now start working on your data processing and machine learning projects.

2023년 3월 23일 (목) 14:42 판

thumb_up 추천메뉴 바로가기


PySpark 설치 (pip 이용)[편집]

sh

pip install pyspark

Set the necessary environment variables. PySpark needs to know the path to your Java installation.

For Linux or macOS:

export JAVA_HOME=<path_to_java>
export SPARK_HOME=$(python -c "import site; print(site.getsitepackages()[0])")/pyspark
export PYSPARK_PYTHON=python3
export PYSPARK_DRIVER_PYTHON=python3

Replace <path_to_java> with the actual path to your Java installation. You can find it using the following command:

/usr/libexec/java_home

For Windows:

powershell

$env:JAVA_HOME = "<path_to_java>"
$env:SPARK_HOME = (python -c "import site; print(site.getsitepackages()[0])") + "\pyspark"
$env:PYSPARK_PYTHON = "python3"
$env:PYSPARK_DRIVER_PYTHON = "python3"

Replace <path_to_java> with the actual path to your Java installation.

Test your PySpark installation: Create a new Python script or open an interactive Python shell, and run the following code:

from pyspark.sql import SparkSession

spark = SparkSession.builder \
    .appName("PySpark Test") \
    .getOrCreate()

data = [("Alice", 1), ("Bob", 2), ("Cathy", 3)]
columns = ["Name", "ID"]

df = spark.createDataFrame(data, columns)
df.show()

If the installation is successful, you will see the DataFrame displayed as output.

(Optional) Install additional packages: You may need additional packages for your PySpark projects, such as pandas, numpy, or matplotlib. Install them using pip:

pip install pandas numpy matplotlib

With PySpark installed, you can now start working on your data processing and machine learning projects.