Monthly Archives: June 2012

Modular Java BeJUG conference

The two presenters were Bert Ertman and Paul Bakker, both working for Luminis, Bert being an official Oracle Java Champion.

They began their talk by stating the following two trends:

  • Applications these days are bigger and bigger and thus are more complex
  • More and more teams adopt, at least partly, agile methodologies

Those trends bring new challenges along with them:

  • dependency management to manage dependencies of the application on its libraries.
  • versioning of the application that must fit with other enterprise projects which have their own, parallel, lifecycle.
  • maintenance on the long term become difficult (“it’s difficult to refactor level 0 of a 20-story skyscraper”).
  • deployment

As applications are moving to the cloud, other non functional requirements enter the game too:

  • As your users never sleep, you must deploy with 0 downtime.
  • Deploying a big monolithic application on the cloud takes time.
  • Customer-specific extensions that must be deployed on a Software as a Service application.

Modularity is the answer to those issues.

What is a module?

Every decent programmer must probably remember this from his classes: a good design has loose coupling but high cohesion.

Coupling is prevented through the use of interfaces and by hiding the actual implementations. Each module has its public and private parts and other modules cannot “touch its private parts”.

But then comes the issue of instantiating the actual classes. A popular way to solve that problem is to use dependency injection. But you can also do it with some kind of service registry which you can ask to provide you with an implementation of type X. Each module notifies the registry about the interfaces he provides implementations for and which interfaces he consumes. That service registry can manage multiple versions of interfaces and choos the best match.

With a little help of design-time modularity, we can thus tame the spaghetti syndrome of the most complex application.

Runtime implementation

When we analyse the same issues from the runtime  view, the JAR file becomes THE unit. So how do we deal with module versioning and intra-module dependency management? How can we replace just 1 module at runtime?

The first part of the answer is: put the architectural focus on modularity! If you don’t group coherent functionality together, there’s no point using the second part of the answer. So that second part is: use a modular framework like OSGi (Jigsaw can be seen as an alternative to OSGi but is far less mature). But keep in mind that a modular framework is no guarantee, the key is a modular architecture.

How well does modular java play in the JavaEE game?

JavaEE is high level. OSGi is low level and provides no enterprise services (transactions, security, remote access, persistence, …) on its own. So you’re left with 3 options:

  • Deploying enterprise services as published OSGi services.
  • A hybrid approach with a classic JavaEE part + a modular part + a bridge between the 2 containers.
  • An OSGi “à la carte” approach.

The first option involves application servers that publish their services as OSGi modules. Glassfish does that. There is now a concept of WAB (Web Application Bundle) files that are WAR files whose servlets are deployed as OSGi resources. A similar system exist for EJB bundles but there is no standard yet.

The second option can be implemented with Weld as CDI container to provide CDI inside the bundle and consume and produce OSGi services.

I’ll complete the description of the third option when I get access to the Parleys’ video (sorry, I ran out of ink at that moment and writing this article 2 weeks after the conference doesn’t help either).

Deploying to the cloud

The main concern here is that a modular application may involve hundreds of bundles. But hopefully, something like Apache ACE platform can help you manage this and gives you the option to redeploy 1 module at a time.

Demos

I won’t go into the details of the demo. The easiest is to wait the Parleys’ video as this will be far more interesting than me trying to write what I’ve seen.

References

Advertisements

Fork/Join and Akka: parallelizing the Java platform BeJUG conference

Here is another article on a BeJUG conference. This time, Sander Mak, software developer and architect at Info Support, The Netherlands, gave us an overview of two concurrency frameworks: Fork/Join and Akka.

Fork/Join is a framework that was added to the standard libraries starting with JDK 7. Akka is an open-source alternative with emphasis on the resilience of concurrent process.

Fork/join

Setting the scene

The CPU speed has been growing up these last years until reaching some kind of a technical limit at around 3,5 GHz. Right now, a CPU is mainly idling while waiting for its I/O. That’s why the new trend is to have multiple CPUs.

But as Sander quote: “the number of idle cores in my machine doubles every two years”. There is an architectural mismatch because developers use to believe that the compiler and/or the JVM can handle parallelism on their own. But unfortunately, this isn’t true.

Demos

The first demos are declined around the computation of Fibonacci’s suite whose definition is

Of course, the objective here is not to find an optimal solution to that problem (transforming the recursive definition into an iterative form) but just apply a concurrent computation of the recursive form.

  1. We can solve this problem by creating 1 thread to compute fib(n-1) and another thread for fib(n-2), then wait they have finished their computation and adding the results.
    Immediately, the number of thread explodes.
  2. If we implement the same algorithm with 2 Callable objects and a thread pool, the number grows slowlier but is high anyway.
    The problem is that the current thread blocks while its two children finish.
  3. With Fork/Join, the thread dependency is explicit and thus the join method call doesn’t block the current thread.

Join/Fork works with an auto-sized thread pool. Each worker thread is assigned a task queue which gets fed by the fork method calls. The interesting behavior is that a worker is allowed to steal work from the task queue of another worker.

Another, more advanced, demo was also perform, demonstrating how to make a dichotomic search on a large set of cities to find which one are within a certain distance from a point. Of course, the algorithm is implemented with Fork/Join.

All the code of those examples is available on http://bit.ly/bejug-fj.

API & patterns

Problem structure

The algorithm must be acyclic (no thread can work with another thread that is already present in its call stack) and CPU-bound. I/O-bound problems wait a lot on blocking system calls and thus prevent those threads to perform other tasks.

Sequential cutoff

To avoid that the overhead consumes all the computation time, you must set a threshold to decide whether the problem should be solved sequentially or in parallel. This leads to define work chunks that are processed in parallel but each steps inside the same chunk are processed sequentially.

Fork once, fool me twice

Some algorithm implementations allows to reuse the current thread to do come computation instead of forking a new task, thus limiting the overhead.

Convenience methods

There exist convenience methods :

  • Method invoke() is semantically equivalent to fork(); join() but always attempts to begin execution in the current thread.
  • Method invokeAll performs the most common form of parallel invocation: forking a set of tasks and joining them all.

Future and comparisons

Fork/Join creates threads. It is thus currently forbidden in the EJB spec. When it comes to CDI or servlet specs, we are there navigating in some kind of grey zone. Maybe this could work with JCA work manager. @asynchronous could be used as an alternative.

Anyway, it is foreseen that JavaEE 7 spec may contain java.util.concurrent package.

Compared with Fork/Join, the more classic ExecutorService doesn’t allow work stealing. It is better suited at coarse-grained independent tasks. Bounded thread pools supports blocking I/O better.

MapReduce implementations they are targeted at cluster and not single JVM. While Fork/Join is targeted at recursive working, MapReduce is often working on a single map. Furthermore, MapReduce has no inter-node communication and thus doesn’t allow work stealing.

The popular critics about Fork/Join are:

  • The implementation is really complex.
  • The scalability is unknown above 100 cores. Which may seem many for a CPU but is far below current standards for a GPU.
  • The one-to-one mapping between the thread pool size and the core numbers is probably too low-level.

With the availability of JDK8, the Fork/Join API could be extended with methods on collections working with lambdas. There is also a CountedCompleter ForkJoinTask implementation that is better at working with I/O-based tasks and that is currently contained in JSR-166-extra).

AKKA

Introduction

Akka is a concurrency framework written in Scala that offers a Java API. I wouldn’t introduce Akka better than their author so here is what they say about it:

We believe that writing correct concurrent, fault-tolerant and scalable applications is too hard. Most of the time it’s because we are using the wrong tools and the wrong level of abstraction. Akka is here to change that. Using the Actor Model we raise the abstraction level and provide a better platform to build correct concurrent and scalable applications. For fault-tolerance we adopt the “Let it crash” model which have been used with great success in the telecom industry to build applications that self-heals, systems that never stop. Actors also provides the abstraction for transparent distribution and the basis for truly scalable and fault-tolerant applications.

Actors

An actor is an object with a local state, a local behavior and a mailbox to receive messages sent to it. An actor processes only one message at a time. And as they are lightweight (around 400 bytes of memory per actor), you can instantiate many of them in a standard JVM heap.

The receive method of an actor is called when a message processeing begins and it is where all the processing is done. The framework itself is responsible for the actor management (thread pooling, dispatching, …).

Using a ForkJoinPool with Akka scales very well.

A demo of Akka usage is available on http://bit.ly/bejug-akka.

References