Sunday, October 04, 2015

Trying AKKA, Java, Scala, IntelliJ, Aerospike, Maven & Git with BitBucket on Windows

Discalimer: it is not a clean "how-to guide" for any of teh above tools or technologies. It's more like a record of my backs and forwards exploring this route, a list of thoughts, foundings and decisions I made while trying to make the things work for me.

I wanted to try some AKKA / Scala code which is new for me, and thought of creating a sample code which puts smth to / gets something from Aerospike DB. My primary machine is Windows 8.1 and I run VirtualBox with a varity of Linux distributions.

Reading the docs I understood that installing Aerospike on Windows actually does not make sense since there is no native Windows code for Aerospike. Intsead, the DB comes with Vagrant VM and it should run inside VirtualBox anyway, so I simply installed Aerospike on aVirtualBox Ubuntu VM following these simple steps: http://www.aerospike.com/docs/operations/install/linux/ubuntu/.

It worked like a charm from the first try, so I went to the next step: AKKA/Scala/Java and IntelliJ. I installed the latest IntelliJ Community (v15 Preview) https://www.jetbrains.com/idea/download/
and Typesafe Activator at http://akka.io/downloads/.

Activator is a web server packed with documentation, explanations and samples, and it can generate ready-to-use Eclipse and IntelliJ projects for each sample, providing both Java and Scala code in every project. When you start the process via suplied "./activator.bat" file it opens a new browser tab showing the list of samples and other content. Quite sure inside it is build with Typesafe' Play framework. I found it a pretty nice thing to start with.

So I went to "Hello Akka" sample in my browser and generated an IntelliJ project, which I opened in the brand new IntelliJ. I wanted the project to be managed by Maven, so I needed to add the Aerospike dependencies to "pom.xml" Maven file. But there was no "pom.xml" in the IntelliJ project generated by Typesafe Activator. Right-click on the "hello-akka" project name -> "Add Framework Support" -> selecting "Maven" added "pom.xml" to my new project (source), so I could add there the dependency code


<dependencies>
  <dependency>
    <groupId>com.aerospike</groupId>
    <artifactId>aerospike-client</artifactId>
    <version>3.1.5</version>
  </dependency>
</dependencies>

as per the Aerospike documentation. Aerospike currently does not have a Scala client library, so the initial plan of Scala tests did not work - I have to write the client code in Java.

If you do not have a working stand-alone Maven on your computer download and install it, so you can build Aerospike examples either with "mvn package" or with the provided "./build_all" script (seems the both commands do exactly the same) right from the command line. You will also need a working Git install for your Windows computer.

Default Maven settings location on Windows 8+ is "C:\Users\\.m2". The remote repositories are specified in "settings.xml" file (details). Your $MAVEN_HOME/conf folder contains a default "settings.xml" file which you can copy in your "Users" folder if something went wrong in the user' copy.

In order to work with a remote Git repository such as GitHub or Bitbucket you will need SSH access to them, otherwise you will be prompted for a password on each Git push. You should have a SSH key for that. This manual explains how to generate / find your SSH keys for GitHub, this one is the similar explanation for BitBucket.
If you do not have a key or if you need yet another SSH key (multiple Bitbuckets accounts require multiple SSH keys, one per each account) you can run this command in Git Bash to generate the new key:

ssh-keygen -t rsa -f ~/.ssh/ -C "my@email.com"

After creating the SSH key open Git Bash and create a ~/.bashrc file with the following content:

SSH_ENV=$HOME/.ssh/environment
 
# start the ssh-agent
function start_agent {
    echo "Initializing new SSH agent..."
    # spawn ssh-agent
    /usr/bin/ssh-agent | sed 's/^echo/#echo/' > "${SSH_ENV}"
    echo succeeded
    chmod 600 "${SSH_ENV}"
    . "${SSH_ENV}" > /dev/null
    /usr/bin/ssh-add
}
 
if [ -f "${SSH_ENV}" ]; then
     . "${SSH_ENV}" > /dev/null
     ps -ef | grep ${SSH_AGENT_PID} | grep ssh-agent$ > /dev/null || {
        start_agent;
    }
else
    start_agent;
fi


Next you can add print & copy the content of SSH key file

cat ~/.ssh/

and paste it to BitBucket or GitHub web UI for your account or your team. Also note this Bitbucket article for further help on multiple accounts / SSH keys.

For multiple BitBucket  accounts we need to create a ~/.ssh/config file with smth like that (note the indent, it's required):

Host personal
 User me
 Hostname bitbucket.org
 PreferredAuthentications publickey
 IdentitiesOnly yes
 IdentityFile ~/.ssh/id_personal
Host work
 User work_user
 Hostname bitbucket.org
 PreferredAuthentications publickey
 IdentitiesOnly yes
 IdentityFile ~/.ssh/id_work

and your Git commands should address these new hosts instead of deafult 'bitbucket.org' or 'github.org':

git clone git@personal:myteam/repo.git

An aletrnative to the Git clone command above could be a Git fetch (the 2-nd answer here) as listed below:

git init
git remote add origin git@personal:myteam/repo.git
git fetch origin
git checkout -b master --track origin/master

Next I had to switch off my VBox Aerospike VM and add "Port Forwarding" (VBox ->select VM -> Settings -> Network -> select your current NW adaptor -> Advanced), forwarding my machine' port 3000 to the VM' port 3000 so I can access the DB from the code by pointing to "127.0.0.1" localhost.



Other suggestions how to access a remote service on a VirtualBox VM actually did not work for me, so ended up with teh Ports Forwarding for now. It's good to start the VM again and manually run Aerospike:

sudo service aerospike start

Now wait for a few seconds and try in the  XTerm in your VM the following:

telnet localhost 3000

if everything is OK you should connect to your local Aerospike DB. Next, open "cmd.exe" on your host machine (install Microsoft telnet utility for your Windows if you did not do it yet) and run:

telnet localhost 3000

on your Windows machine. Assuming your port forwarding was properly defined you should be now connected to Aerospike on your VirtualBox VM.

Now it's time to get back to IntelliJ, but first take a break and read "Java client best practices" before moving on with coding with Aerospike. The first try with Java Aerospike client looks this way

import com.aerospike.client.*;
import com.aerospike.client.policy.BatchPolicy;
import com.aerospike.client.policy.ClientPolicy;
import com.aerospike.client.policy.WritePolicy;

public class MyAerospikeTest {

    public static void main( String[] argv){
        Host[] hosts = new Host[] {
                new Host("127.0.0.1", 3000),
//                new Host("another.host", 3000),//                new Host("and.another.host", 3000)        };

        AerospikeClient client = new AerospikeClient( new ClientPolicy(), hosts);
        // write a new value following the example: http://www.aerospike.com/docs/client/java/usage/kvs/write.html
        // Initialize writePolicy.        WritePolicy writePolicy = new WritePolicy();
        writePolicy.timeout = 5;  // 5 millisecond timeout.
        // Write multiple values.        Key key = new Key( "test", "myset", "mykey");
        Bin bin1 = new Bin( "name", "John");
        Bin bin2 = new Bin( "age", 25);
        client.put( writePolicy, key, bin1, bin2);

        System.out.println( "Added a new record to Aerospike DB!");

        // Now let's see what we wrote there        BatchPolicy batchPolicy = new BatchPolicy();
        batchPolicy.timeout = 2; // 2 ms timeout on read        // Get the record        Record whatWeWrote = client.get( batchPolicy, key);
        System.out.println( "Got the record from DB");
        // get some bins from the record (note to get just two bins "name" and "age" we could use: Record record = client.get(policy, key, "name", "age"); )        System.out.println( "Name: " + whatWeWrote.bins.get("name") + ", Age: " + whatWeWrote.bins.get("age"));

        // delete the record        client.delete( writePolicy, key);
        System.out.println( "And now removed the record from DB");

        client.close();

        System.out.println( "Closed connection and exited. See you soon");
    }
}


and when it runs it produces the expected output:

Added a new record to Aerospike DB!
Got the record from DB
Name: John, Age: 25
And now removed the record from DB
Closed connection and exited. See you soon

Process finished with exit code 0


So far so good. Moving forward, I was quite surprised reading about configuration challenges associated with the Aerospike namespaces. Storage receipts added even more confusion: what storage configuration should be requested in my use cases and how I am supposed to know that? I definitely do not want to get into some namespace limit, neither in terms of defined space not for the access time / latency once the data set grows up. Meanwhile I left it for the future research - I still need to move on with putting all things to work together.

The good surprise was Secondary Indexes and the query language, including aql tool. Clustering and monitoring for Community Edition also look promising. And Java docs are very comprehensive.

And here is yet another surpirse: it appears that IntelliJ (including the Community Edition) can convert any Java file to Scala with "Add framework support - >Scala" switched on for a Project / Module (Refactor menu / CTRL+SHIFT + G). Meaning, you can take open-source Aerospike client and rebuild it in Scala, and then contrubute a Scala plugin back to Aerospike. Or you can write Java code around Aerospike Java library and convert only your code to Scala with IntelliJ. And of course you can simply write your Scala code down and call Aerospike Java client from there as well.

In IntelliJ, in order to let module A in a project to be aware of packages / classes of module B we to add "A on B dependency". For Maven project / modules it is easy - add smth like

<dependency>
    <groupId>my-group-id</groupId>
    <artifactId>my-artifact-id</artifactId>
    <version>1.0</version>
</dependency>

to "A" pom.xml file, where the above IDs are like they are specified for your module "B".

Aeropsike namespaces can only be added by adding the namespace definition to
 /etc/aerospike/aerospike.conf
file. For the possible configuration parameters please refer to http://www.aerospike.com/docs/reference/configuration/.  After the new namespace is configured restart the Aerospike server.
(also note this post to avoid issues when adding namespaces. In general, namespaces can be RAM (with or without HHD for persistence) and SSD / Flash.

In the same way, the only option to delete a namespace in Aerospike is stopping the service, deleting the namespace from the configuration file, removing the associated file and then restarting the server.

Quite a lot of what compared to DELETE DATABASE AAA
MySQL SQL statement.

I found this nice Scala (wrapper) library for Aerospike: https://github.com/Tapad/scaerospike
and the blog post Aerospike on scaling:
http://engineering.tapad.com/blog/2014/08/aerospike-at-tapad-how-we-scale/

More about scaling up: http://www.slideshare.net/AerospikeDB/linked-in-twitter-facebook-google-email-embed-configuring-aerospike-part-2
Slide 7: in the clustered environment adding / removing a namespace requires cluster-wide restart
Slide 8: keep nodes identical to run at full cluster' capacity
Slide 9, 24: like mentioned by Tapad guys in the above post high-watermark and default TTL look to be the most important namespace configuration parameters. TTL / Expiration logic explained at Slide 25. NOte that if a record is updated the TTL is restarted.
Slide 11: A single record always belong to a single node
Slide 28: At high watermark server drops items with closest TTL.
Slide 29: Aerospike can stop writing new records if "stop-writes" is reached.


For now I got everything up and running with Java and Aerospike' handy Java API. I still have to look into Tapad' open source Scala Aerospike client.

No comments: