Access aws S3 from Spark-shell

In this example, let’s count the number of records, in S3 bucket by Scala program by using Apache Spark framework.


  • Spark
  • AWS S3 bucket


sc.hadoopConfiguration.set(“fs.s3n.awsAccessKeyId”, AWS_ACCESS_KEY)
sc.hadoopConfiguration.set(“fs.s3n.awsSecretAccessKey”, AWS_SECRET_KEY)
sc.hadoopConfiguration.set(“fs.s3n.impl”, “org.apache.hadoop.fs.s3native.NativeS3FileSystem”)

val input_file = “s3n://<Bucket_Name>/Path”

val rawdata = sc.textFile(input_file)
val test = rawdata.count

Things to take care.

  1. From AWS S3, you should have getObject access.
  2. “sc” is a Spark Context, It is automatically created by Spark Shell.
  3. Most of issue comes, in permission issue, if “403” error, comes, it is usually a permission error.
  4. Even from command line, you can use aws-cli utility to check, the access.

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )


Connecting to %s