Logical directed acyclic graph

When a client submits a spark user-application code, the driver implicitly converts the code containing transformations into a logical directed acyclic graph (DAG).

When a client submits a spark user-application code, the driver implicitly converts the code containing transformations into a logical directed acyclic graph (DAG). Then it converts the logical DAG into physical execution plan with set of stages. After creating the physical execution plan, it creates small physical execution units referred to as tasks under each stage.

Dotnet core features

The driver program then talks to the cluster manager and negotiates for resources. The cluster manager then launches executors on the worker nodes on behalf of the driver. At this point the driver sends tasks to the cluster manager based on data placement. Now executors start executing the various tasks assigned by the driver program. At any point of time when the spark application is running, the driver program will monitor the set of executors that run.

Conclusion : The structure of a Spark program at higher level is - RDD's are created from the input data and new RDD's are derived from the existing RDD's using different transformations, after which an action is performed on the data. In any spark program, the DAG operations are created by default and whenever the driver runs the job(through an action), the Spark DAG will be converted into a physical execution plan.

Step By Step process on new technologies