Access aws S3 from Spark-shell

In this example, let’s count the number of records, in S3 bucket by Scala program by using Apache Spark framework.


  • Spark
  • AWS S3 bucket


sc.hadoopConfiguration.set(“fs.s3n.awsAccessKeyId”, AWS_ACCESS_KEY)
sc.hadoopConfiguration.set(“fs.s3n.awsSecretAccessKey”, AWS_SECRET_KEY)
sc.hadoopConfiguration.set(“fs.s3n.impl”, “org.apache.hadoop.fs.s3native.NativeS3FileSystem”)

val input_file = “s3n://<Bucket_Name>/Path”

val rawdata = sc.textFile(input_file)
val test = rawdata.count

Things to take care.

  1. From AWS S3, you should have getObject access.
  2. “sc” is a Spark Context, It is automatically created by Spark Shell.
  3. Most of issue comes, in permission issue, if “403” error, comes, it is usually a permission error.
  4. Even from command line, you can use aws-cli utility to check, the access.

