ITtutorial, from its name it’s already known that it is a website that gives a place for Tutorial & Platform for Developers. It is a site where you can find tutorials on different softweres. These are mainly related to IT( information technology). These tutorials are about how to use software or how to fix its issues or problems. It also gives you information about the method of coding.
Pyspark is a python based API that utilizes the spark framework by python. Pyspark was made by the collaboration of apache spark and python. Apache spark is an open source computing framework built around speed, for ease of use and streaming analytics.
If you have used python, SQL, or worked on apache spark then it is quite easy to learn for you and it might be the best one for you. Pyspark is quite more famous than spark and other interfaces because it uses python language in it and python is a famous coding language.
If you want to learn about pyspark then you should visit ittutorial because it is the best place for Tutorial & Platform for Developers. if you want to be a developer or want to learn different conversion of string to date in pyspark then you should visit ITtutoria – Tutorial & Platform for Developers.
Pyspark convert string to date by two methods
Method-1 ( Pyspark convert string to date with to_date() function )
In this method, you have to use different data frames for the conversion of strings to date. A data model is converted to data format with this method. For example MM-DD-YYYY. It takes these different data formats and determines their string value. It takes date strings as 1st argument and takes patterns as 2nd argument.
One of the common syntax is
to_date(col(“string_column_name”),”YYYY-MM-DD”)
after using the above syntax or to_date format for practical data frame import the function required for Pyspark convert string to date. Use the given commands after that
from pyspark.sql.functions import *
df2 = df1.select(col(“column_name”),to_date(col(“column_name”),”YYYY-MM-DD”).alias(“to_date”))
the output of the corresponding above commands is
df2.show()
syntaxes used in the commands are
- Df1: this is the data frame or data model for which Pyspark convert string to date.
- Df2: this data model is created after converting string to date in pyspak.
- To_date: this function is used for the conversion of string to date.
- YYYY-MM-DD: It’s a data format. It can be MM-DD-YYYY or DD-MM-YYYY.
- Alias: is a function for adding a particular signature to a column or table so that it becomes short and readable. The alias value is then again returned to the new data frame.
A simple data frame for conversion to pyspark convert string to date is
df1=spark.createDataFrame(
data = [ (“1″,”Angela”,”2018-18-07 14:01:23.000″),(“2″,”Amandy”,”2018-21-07 13:04:29.000″),(“3″,”Michalle”,”2018-24-07 06:03:13.009″)],
schema=[“Id”,”CustomerName”,”timestamp”])
df1.printSchema()
the timestamp format, in this case, is YYYY-MM-DD. It will now be converted to a date column by selecting the timestamp by using the syntaxes
from pyspark.sql.functions import *
df2 = df1.select(col(“timestamp”),to_date(col(“timestamp”),”YYYY-MM-DD”).alias(“to_date”))
df2.show()
the syntaxes and coding are then given a new form in the to_date function
+———-+———-+
| input |
to_date|
+———-+———-+
|2018-18-07|2018-07-18|
|2018-21-07|2023-07-21|
|2018-24-07|2023-07-24|
+———-+———-+
Method-2( pyspark convert string to date in SQL)
You can leave the above to_date function in SQL pyspark. Its general formula is
spark.sql(“select to_date(‘2018-18-07′,’YYYY-MM-DD’) to_date”)
.show()
The string will be converted into YYYY-MM-DD
+———-+———-+
| input| to_date|
+———-+———-+
|2018-18-07|2018-07-18|
Conclusion
Pyspark convert string to date by the above two methods. But the method-1 is easier and more efficient which is why it is best to use Method-1 ( Pyspark convert string to date with to_date() function ).