Blog

Spark packages from a password protected Repository

11 Dec, 2017
Xebia Background Header Wave

At my current client, we use Sonatype Nexus to store our artifacts. The repository is secured with a username/password both for publishing as downloading artifacts.

Spark is having support for specific repositories with the –repositories configuration.

We use it like this:

pyspark 
 --repositories https://readonly:secret_password@nexus/repository/maven-public/
 --packages com.example:foobar:1.0.0

Unfortunately, we ran into the following issue:

    ==== repo-1: tried

      https://readonly:secret_password@nexus/repository/maven-public/com/example/foobar/1.0.0/foobar-1.0.0.pom

      -- artifact com.example#foobar;1.0.0!foobar.jar:

      https://readonly:secret_password@nexus/repository/maven-public/com/example/foobar/1.0.0/foobar-1.0.0.jar

        ::::::::::::::::::::::::::::::::::::::::::::::

        ::          UNRESOLVED DEPENDENCIES         ::

        ::::::::::::::::::::::::::::::::::::::::::::::

        :: com.example#foobar;1.0.0: not found

        ::::::::::::::::::::::::::::::::::::::::::::::

The strange thing: The url is correct. With curl we can download the dependency:

curl -s -o /dev/null -v https://readonly:secret_password@nexus/repository/maven-public/com/example/foobar/1.0.0/foobar-1.0.0.pom
* Hostname was NOT found in DNS cache
*   Trying 35...
* Connected to foobar.com (35.xxx.xxx.x) port 443 (#0)
* successfully set certificate verify locations:
*   CAfile: none
  CApath: /etc/ssl/certs
...
...
200 OK

Okay, let’s debug this thing by using ivy directly.

Ivy is using a config file to configure the Nexus repository so I tried:


  defaultResolver="nexus"/>
  name="nexus-public"
                   value="https://nexus/repository/maven-public"/>
  
      name="nexus" m2compatible="true" root="${nexus-public}"/>
    

curl -L -O http://search.maven.org/remotecontent?filepath=org/apache/ivy/ivy/2.4.0/ivy-2.4.0.jar
java -jar ivy-2.4.0.jar -settings ivy.settings -dependency com.example foobar 1.0.0 -debug

Here we end up with the same issue. So the issue is not Spark related, but Ivy.

    ==== nexus: tried

      https://readonly:secret_password@nexus/repository/maven-public/com/example/foobar/1.0.0/foobar-1.0.0.pom

      -- artifact com.example#foobar;1.0.0!foobar.jar:

      https://readonly:secret_password@nexus/repository/maven-public/com/example/foobar/1.0.0/foobar-1.0.0.jar

        ::::::::::::::::::::::::::::::::::::::::::::::

        ::          UNRESOLVED DEPENDENCIES         ::

        ::::::::::::::::::::::::::::::::::::::::::::::

        :: com.example#foobar;1.0.0: not found

        ::::::::::::::::::::::::::::::::::::::::::::::

With the -debug option we find the following:

HTTP response status: 401 url=https://readonly:secret_password@nexus/repository/maven-public/com/example/foobar/1.0.0/foobar-1.0.0.jar
CLIENT ERROR: Unauthorized url=https://readonly:secret_password@nexus/repository/maven-public/com/example/foobar/1.0.0/foobar-1.0.0.jar
    nexus: resource not reachable for com/example#foobar;1.0.0: res=https://readonly:secret_password@nexus/repository/maven-public/com/example/foobar/1.0.0/foobar-1.0.0.jar

Now we understand the issue, we can start googling. I found this StackOverflow issue

So Let’s change the basic authentication in the URL to a credentials block.


  defaultResolver="nexus"/>
  name="nexus-public"
                   value="https://nexus/repository/maven-public"/>
  host="nexus" realm="Sonatype Nexus Repository Manager"
    username="readonly" passwd="secret_password" />
  
      name="nexus" m2compatible="true" root="${nexus-public}"/>
    

Now everything works like a charm. Time to fix the pyspark command.

pyspark
  --packages com.example:foobar:1.0.0
  --conf spark.jars.ivySettings=/tmp/ivy.settings

Now Spark is able to download the packages as well. I’m a happy camper again.
What is left for us to do, is to add this in our init script to initialize new dataproc clusters with this setup.

Questions?

Get in touch with us to learn more about the subject and related solutions

Explore related posts