"스파크 설치"의 두 판 사이의 차이
DB CAFE
(새 문서: == Install PySpark using pip: sh Copy code pip install pyspark Set the necessary environment variables. PySpark needs to know the path to your Java installation. For Linux or macOS:...) |
|||
1번째 줄: | 1번째 줄: | ||
− | == | + | == PySpark 설치 (pip 이용) == |
− | sh | + | sh<source lang=shell> |
− | |||
pip install pyspark | pip install pyspark | ||
+ | </source> | ||
Set the necessary environment variables. PySpark needs to know the path to your Java installation. | Set the necessary environment variables. PySpark needs to know the path to your Java installation. | ||
+ | |||
For Linux or macOS: | For Linux or macOS: | ||
− | + | <source lang=shell> | |
− | |||
export JAVA_HOME=<path_to_java> | export JAVA_HOME=<path_to_java> | ||
export SPARK_HOME=$(python -c "import site; print(site.getsitepackages()[0])")/pyspark | export SPARK_HOME=$(python -c "import site; print(site.getsitepackages()[0])")/pyspark | ||
export PYSPARK_PYTHON=python3 | export PYSPARK_PYTHON=python3 | ||
export PYSPARK_DRIVER_PYTHON=python3 | export PYSPARK_DRIVER_PYTHON=python3 | ||
+ | </source> | ||
Replace <path_to_java> with the actual path to your Java installation. You can find it using the following command: | Replace <path_to_java> with the actual path to your Java installation. You can find it using the following command: | ||
− | + | <source lang=shell> | |
− | |||
− | |||
/usr/libexec/java_home | /usr/libexec/java_home | ||
+ | </source> | ||
For Windows: | For Windows: | ||
powershell | powershell | ||
− | + | <source lang=shell> | |
$env:JAVA_HOME = "<path_to_java>" | $env:JAVA_HOME = "<path_to_java>" | ||
$env:SPARK_HOME = (python -c "import site; print(site.getsitepackages()[0])") + "\pyspark" | $env:SPARK_HOME = (python -c "import site; print(site.getsitepackages()[0])") + "\pyspark" | ||
$env:PYSPARK_PYTHON = "python3" | $env:PYSPARK_PYTHON = "python3" | ||
$env:PYSPARK_DRIVER_PYTHON = "python3" | $env:PYSPARK_DRIVER_PYTHON = "python3" | ||
+ | </source> | ||
+ | |||
Replace <path_to_java> with the actual path to your Java installation. | Replace <path_to_java> with the actual path to your Java installation. | ||
30번째 줄: | 32번째 줄: | ||
Create a new Python script or open an interactive Python shell, and run the following code: | Create a new Python script or open an interactive Python shell, and run the following code: | ||
− | python | + | <source lang=python> |
− | |||
from pyspark.sql import SparkSession | from pyspark.sql import SparkSession | ||
43번째 줄: | 44번째 줄: | ||
df = spark.createDataFrame(data, columns) | df = spark.createDataFrame(data, columns) | ||
df.show() | df.show() | ||
+ | </source> | ||
+ | |||
If the installation is successful, you will see the DataFrame displayed as output. | If the installation is successful, you will see the DataFrame displayed as output. | ||
48번째 줄: | 51번째 줄: | ||
You may need additional packages for your PySpark projects, such as pandas, numpy, or matplotlib. Install them using pip: | You may need additional packages for your PySpark projects, such as pandas, numpy, or matplotlib. Install them using pip: | ||
− | + | <source lang=shell> | |
− | |||
pip install pandas numpy matplotlib | pip install pandas numpy matplotlib | ||
+ | </source> | ||
With PySpark installed, you can now start working on your data processing and machine learning projects. | With PySpark installed, you can now start working on your data processing and machine learning projects. |
2023년 3월 23일 (목) 14:42 판
- DBA { Oracle DBA 명령어 > DBA 초급 과정 > DBA 고급 과정 }
- 튜닝 { 오라클 튜닝 목록 }
- 모델링 { 데이터 모델링 가이드 }
PySpark 설치 (pip 이용)[편집]
sh
pip install pyspark
Set the necessary environment variables. PySpark needs to know the path to your Java installation.
For Linux or macOS:
export JAVA_HOME=<path_to_java>
export SPARK_HOME=$(python -c "import site; print(site.getsitepackages()[0])")/pyspark
export PYSPARK_PYTHON=python3
export PYSPARK_DRIVER_PYTHON=python3
Replace <path_to_java> with the actual path to your Java installation. You can find it using the following command:
/usr/libexec/java_home
For Windows:
powershell
$env:JAVA_HOME = "<path_to_java>"
$env:SPARK_HOME = (python -c "import site; print(site.getsitepackages()[0])") + "\pyspark"
$env:PYSPARK_PYTHON = "python3"
$env:PYSPARK_DRIVER_PYTHON = "python3"
Replace <path_to_java> with the actual path to your Java installation.
Test your PySpark installation: Create a new Python script or open an interactive Python shell, and run the following code:
from pyspark.sql import SparkSession
spark = SparkSession.builder \
.appName("PySpark Test") \
.getOrCreate()
data = [("Alice", 1), ("Bob", 2), ("Cathy", 3)]
columns = ["Name", "ID"]
df = spark.createDataFrame(data, columns)
df.show()
If the installation is successful, you will see the DataFrame displayed as output.
(Optional) Install additional packages: You may need additional packages for your PySpark projects, such as pandas, numpy, or matplotlib. Install them using pip:
pip install pandas numpy matplotlib
With PySpark installed, you can now start working on your data processing and machine learning projects.