Tag Archives: Java

Hadoop data-mining swiss army knife by @plopezFr and @BertrandDechoux at #devoxx #DV13-HadoopCode

Hadoop data-mining swiss army knife

The website voyages-sncf.com sells half of the Thalys tickets in France and is one of the most visited websites in Europe. That high load triggered a huge amount of logs that, at first, were not used. After some time, they wanted to sek value in those logs and started investigating solutions for distributed computing. Continue reading


Practical RESTful persistence by @shaunmsmith at #devoxx

Practical RESTful persistence

EclipseLink provides a JPA-RS implementation. Let’s see what hides behind this.

The use case is a web application with the main logic inside the browser and a backend providing only persistency. We want to provide a REST API to that persistence. Continue reading

Java 8 and beyond #devoxx #dv13 by @mreinold and @BrianGoetz

Java 8 and beyond

Keynote performed at Devoxx 2013 by Mark Reinhold and Brian Goetz

Java exists for 18 years. Nothing can stay so long in the field without evolving.

One of the latest evolution in computer science is multiplication of cores and pipelines in processor.

For that reason it is intersting to look at problems from a perspective where everything is splitted in parallel data flows which split and split and split until complex problems resolve themselves as a multitude of simple parallel calculation.

To support that perspective, closures were added.

Continue reading

Environment-specific configuration in a maven-spring-mvc project


For once, this article is not about a conference but about some issues I thought about recently.

The context is a training session I give in my company. In that training session, I use a web app to illustrate some points and let the attendees perform some exercises. The web app project uses maven, spring-mvc and multiple databases (mysql and hsqldb) for different purposes (local playing, local integration tests, CI integration tests and “production”).

Those different purposes have different needs. Ideally, I would like to have something like:

  1. local playing –> local mysql database
  2. local integration tests –> hsqldb in-memory database
  3. CI integration tests –> dedicated mysql database running on the CI server
  4. “production” –> dedicated mysql database running on the “production” server

To update schema updates, I use liquibase. I would like to run it in update mode on environments 1, 2 and 4 and in drop-and-create mode on environment 3.

I know a lot of different solutions that can achieve this situation, but the solution must be neat (it’s a beginner training application and thus shouldn’t attract the focus on themes I don’t want to talk about). Currently, I still haven’t found THE perfect solution. So I’ll describe the ones I’ve already tried with some kind of success in coming articles. Feel free to propose anything in comment. I’ll be glad to investigate your propositions.

Devoxx 2012 Oracle keynote: Make the Future Java

Oracle stated success factors

  • technology innovation
  • community participation
  • Oracle leadership

The current work focus is currently put on JavaFX and embedded Java development.

What’s new in JavaSE 8?


The first big change is the inclusion of closures. These adopt the form of

(x, y) -> x + y

Those closures will come with a whole new set of methods, specially on collections, in order to provide some kind of fluent API. There will also be a new keyword default which will allow developers to provide a default implementation on an interface.

Type annotations

Type annotation give further information to the compiler and thus allow it to check some invariants at compile time. Such checks can be nullability checks, immutability checks and so on.

Compact profile

This new profile defines a subset of essential libaries that will be part of a reduced Java Platform aimed at embedded JVMs where memory consumption and file space are concerns.

JavaEE news

The JavaEE 7th version will focus on simplicity and support of HTML5. The JavaEE 8th version will be more oriented towards cloud support and modularity.

And after?

The next goals will be embedding on more and more devices and platforms (e.g. iOS or ARM processors) on the one hand and providing “embedded suites” containing a JVM, an application server and a database together.


Now that I am a little rested from three frantic days in Antwerpen, I’ll start to structure my notes and write summaries about presentations I attended at Devoxx 2012. Once again a big thanks to Stephan Jansens and his team. … Continue reading

Modular Java BeJUG conference

The two presenters were Bert Ertman and Paul Bakker, both working for Luminis, Bert being an official Oracle Java Champion.

They began their talk by stating the following two trends:

  • Applications these days are bigger and bigger and thus are more complex
  • More and more teams adopt, at least partly, agile methodologies

Those trends bring new challenges along with them:

  • dependency management to manage dependencies of the application on its libraries.
  • versioning of the application that must fit with other enterprise projects which have their own, parallel, lifecycle.
  • maintenance on the long term become difficult (“it’s difficult to refactor level 0 of a 20-story skyscraper”).
  • deployment

As applications are moving to the cloud, other non functional requirements enter the game too:

  • As your users never sleep, you must deploy with 0 downtime.
  • Deploying a big monolithic application on the cloud takes time.
  • Customer-specific extensions that must be deployed on a Software as a Service application.

Modularity is the answer to those issues.

What is a module?

Every decent programmer must probably remember this from his classes: a good design has loose coupling but high cohesion.

Coupling is prevented through the use of interfaces and by hiding the actual implementations. Each module has its public and private parts and other modules cannot “touch its private parts”.

But then comes the issue of instantiating the actual classes. A popular way to solve that problem is to use dependency injection. But you can also do it with some kind of service registry which you can ask to provide you with an implementation of type X. Each module notifies the registry about the interfaces he provides implementations for and which interfaces he consumes. That service registry can manage multiple versions of interfaces and choos the best match.

With a little help of design-time modularity, we can thus tame the spaghetti syndrome of the most complex application.

Runtime implementation

When we analyse the same issues from the runtime  view, the JAR file becomes THE unit. So how do we deal with module versioning and intra-module dependency management? How can we replace just 1 module at runtime?

The first part of the answer is: put the architectural focus on modularity! If you don’t group coherent functionality together, there’s no point using the second part of the answer. So that second part is: use a modular framework like OSGi (Jigsaw can be seen as an alternative to OSGi but is far less mature). But keep in mind that a modular framework is no guarantee, the key is a modular architecture.

How well does modular java play in the JavaEE game?

JavaEE is high level. OSGi is low level and provides no enterprise services (transactions, security, remote access, persistence, …) on its own. So you’re left with 3 options:

  • Deploying enterprise services as published OSGi services.
  • A hybrid approach with a classic JavaEE part + a modular part + a bridge between the 2 containers.
  • An OSGi “à la carte” approach.

The first option involves application servers that publish their services as OSGi modules. Glassfish does that. There is now a concept of WAB (Web Application Bundle) files that are WAR files whose servlets are deployed as OSGi resources. A similar system exist for EJB bundles but there is no standard yet.

The second option can be implemented with Weld as CDI container to provide CDI inside the bundle and consume and produce OSGi services.

I’ll complete the description of the third option when I get access to the Parleys’ video (sorry, I ran out of ink at that moment and writing this article 2 weeks after the conference doesn’t help either).

Deploying to the cloud

The main concern here is that a modular application may involve hundreds of bundles. But hopefully, something like Apache ACE platform can help you manage this and gives you the option to redeploy 1 module at a time.


I won’t go into the details of the demo. The easiest is to wait the Parleys’ video as this will be far more interesting than me trying to write what I’ve seen.


Fork/Join and Akka: parallelizing the Java platform BeJUG conference

Here is another article on a BeJUG conference. This time, Sander Mak, software developer and architect at Info Support, The Netherlands, gave us an overview of two concurrency frameworks: Fork/Join and Akka.

Fork/Join is a framework that was added to the standard libraries starting with JDK 7. Akka is an open-source alternative with emphasis on the resilience of concurrent process.


Setting the scene

The CPU speed has been growing up these last years until reaching some kind of a technical limit at around 3,5 GHz. Right now, a CPU is mainly idling while waiting for its I/O. That’s why the new trend is to have multiple CPUs.

But as Sander quote: “the number of idle cores in my machine doubles every two years”. There is an architectural mismatch because developers use to believe that the compiler and/or the JVM can handle parallelism on their own. But unfortunately, this isn’t true.


The first demos are declined around the computation of Fibonacci’s suite whose definition is

Of course, the objective here is not to find an optimal solution to that problem (transforming the recursive definition into an iterative form) but just apply a concurrent computation of the recursive form.

  1. We can solve this problem by creating 1 thread to compute fib(n-1) and another thread for fib(n-2), then wait they have finished their computation and adding the results.
    Immediately, the number of thread explodes.
  2. If we implement the same algorithm with 2 Callable objects and a thread pool, the number grows slowlier but is high anyway.
    The problem is that the current thread blocks while its two children finish.
  3. With Fork/Join, the thread dependency is explicit and thus the join method call doesn’t block the current thread.

Join/Fork works with an auto-sized thread pool. Each worker thread is assigned a task queue which gets fed by the fork method calls. The interesting behavior is that a worker is allowed to steal work from the task queue of another worker.

Another, more advanced, demo was also perform, demonstrating how to make a dichotomic search on a large set of cities to find which one are within a certain distance from a point. Of course, the algorithm is implemented with Fork/Join.

All the code of those examples is available on http://bit.ly/bejug-fj.

API & patterns

Problem structure

The algorithm must be acyclic (no thread can work with another thread that is already present in its call stack) and CPU-bound. I/O-bound problems wait a lot on blocking system calls and thus prevent those threads to perform other tasks.

Sequential cutoff

To avoid that the overhead consumes all the computation time, you must set a threshold to decide whether the problem should be solved sequentially or in parallel. This leads to define work chunks that are processed in parallel but each steps inside the same chunk are processed sequentially.

Fork once, fool me twice

Some algorithm implementations allows to reuse the current thread to do come computation instead of forking a new task, thus limiting the overhead.

Convenience methods

There exist convenience methods :

  • Method invoke() is semantically equivalent to fork(); join() but always attempts to begin execution in the current thread.
  • Method invokeAll performs the most common form of parallel invocation: forking a set of tasks and joining them all.

Future and comparisons

Fork/Join creates threads. It is thus currently forbidden in the EJB spec. When it comes to CDI or servlet specs, we are there navigating in some kind of grey zone. Maybe this could work with JCA work manager. @asynchronous could be used as an alternative.

Anyway, it is foreseen that JavaEE 7 spec may contain java.util.concurrent package.

Compared with Fork/Join, the more classic ExecutorService doesn’t allow work stealing. It is better suited at coarse-grained independent tasks. Bounded thread pools supports blocking I/O better.

MapReduce implementations they are targeted at cluster and not single JVM. While Fork/Join is targeted at recursive working, MapReduce is often working on a single map. Furthermore, MapReduce has no inter-node communication and thus doesn’t allow work stealing.

The popular critics about Fork/Join are:

  • The implementation is really complex.
  • The scalability is unknown above 100 cores. Which may seem many for a CPU but is far below current standards for a GPU.
  • The one-to-one mapping between the thread pool size and the core numbers is probably too low-level.

With the availability of JDK8, the Fork/Join API could be extended with methods on collections working with lambdas. There is also a CountedCompleter ForkJoinTask implementation that is better at working with I/O-based tasks and that is currently contained in JSR-166-extra).



Akka is a concurrency framework written in Scala that offers a Java API. I wouldn’t introduce Akka better than their author so here is what they say about it:

We believe that writing correct concurrent, fault-tolerant and scalable applications is too hard. Most of the time it’s because we are using the wrong tools and the wrong level of abstraction. Akka is here to change that. Using the Actor Model we raise the abstraction level and provide a better platform to build correct concurrent and scalable applications. For fault-tolerance we adopt the “Let it crash” model which have been used with great success in the telecom industry to build applications that self-heals, systems that never stop. Actors also provides the abstraction for transparent distribution and the basis for truly scalable and fault-tolerant applications.


An actor is an object with a local state, a local behavior and a mailbox to receive messages sent to it. An actor processes only one message at a time. And as they are lightweight (around 400 bytes of memory per actor), you can instantiate many of them in a standard JVM heap.

The receive method of an actor is called when a message processeing begins and it is where all the processing is done. The framework itself is responsible for the actor management (thread pooling, dispatching, …).

Using a ForkJoinPool with Akka scales very well.

A demo of Akka usage is available on http://bit.ly/bejug-akka.


Restructuring: Improving the modularity of an existing code-base BruJUG conference

Here is an article about a BruJUG conference given on 26/04/2012 by the founder of Structure101 company, Chris Chedgey. The complete conference video is available on vimeo.

What is restructuring?

Refactoring and restructuring are both terms that imply non-functional changes.

Refactoring means changing the code to make it more readable. This also means invasive code editing. It usually involves only a few classes.

Restructuring means reorganizing the code-base without change to the code to improve the modularity and make the code-base easier to understand. This involves minimally invasive code editing but the scope is the whole code-base.

The code-base structure has 3 aspects:

  • the package composition
  • the dependencies between them
  • and the hierarchy of the “nested levels” of packages

The two code-base quality factors to consider here are complexity and modularity.

And why is it important?

Because a better code-base makes your code more understandable. And understandable code is cheaper to maintain and evolve. Changes have a more predictable impact on the code. And, of course, your code-base have a better testability, reusability. At the end, your code has a better value.


Complexity can be measured by different means. Two of them are fatness and tangles and can be used to measure complexity.

Fatness is when you have too much code in one place (number of method in a class, number of class in a package, number of package nested under the same package or in the same component, …).

Tangles occur when some code in a package reference code in another package which itself references code in the first package (cyclic dependencies).

Both fatness and tangles can be approximated by automatically by metrics which make them good candidate for automatic checking using thresholds (e.g. in your build system).

This diagram shows the link between tangles and fatness. This means that you can eliminate all tangles pretty easily by moving everything to the same place. But then you get 100% fatness. To the contrary, you can eliminate fatness by partitioning your code-base. But then you create tangles. What you seek is a compromise between the two. What you really don’t want is a code-base that is both fat and full of tangles.


Modularity is best-defined by the mantra “high cohesion, low coupling“.

Modularity can show itself by multiple means. One of them is well-defined public interfaces while the remaining internals are kept private. Another one is when your packages have a clear responsibility.

Unfortunately, the best way to assess modularity of the code is to make it checked by a human software architect.

So how can I work on my code-base structure?

Usually, the methods and classes are OK. But there is almost no logical organisation of classes into higher level modules (= packages in Java). Packages are too often used more like a filesystem and not as an embodiment of module hierarchy.

What you need to have a good code-base structure is to understand neatly the following aspects:

  • package composition and dependencies between packages
  • the flow of dependencies
  • the application business

Once you understand all of this correctly, you can define and achieve your architectural target.

Restructuring strategies

There are a lot of strategies you can use. Here are some chosen one.

Merge parallel structures

If you have parallel structure (one for presentation, one for services, one for persistence, one for extranet, one for intranet, etc.), you’d better merge them to minimize the dependencies between packages.

Bust very large class tangles early

You often find yourself with one or a few large classes tangles spanning many packages. Fixing these will improve your code-base rapidly.

Do as much as you can by only moving classes and packages first

This is a least invasive refactoring you can do to improve your complexity and modularity. Moreover, this requires low effort.

Bottom-up or top-down approach?

Both are valid but have different impacts.

Top-down approach keeps as much of the existing package hierarchy as possible. This means that the “psychological” impact on the application team will be minimized.

Bottom-up approach uses to end far away from the current structure but is often easiest to achieve.

Tackle complexity before modularity

A structure without tangles is way easier to manipulate.

Other strategies

  • Split packages that lack cohesion
  • Split fat packages and fat classes
  • Move tangles together
  • Make the restructuring a milestone


That your code-base is a mess regarding modularity matters is common.
Lack of structure costs money.
That lack can be salvaged.
Restructuring your code-base is not easy but huge returns can happen.


Here ends the “theoretical part” of the presentation and begin the examples, illustrated by the ReStructure101 software which helps the architect to visualize the current structure of his code-base and allows to simulate structure changes and their impacts.

That tool philosophy is to create a task list reflecting the changes done rather than changing the code-base directly. After a restructuring session, the architects ends up with that task list that he can perform himself or plan to be executed by other developers.

Plugins allows to use that task list easily inside IDE like Eclipse or IntelliJ.

I’d say I love this philosophy because it gives you the feeling you’re always in control and that you are not only executing a drag’n’drop session in a GUI but really modifying your code-base deeply.

Thanks BruJUG for this enlightening conference.

See you next time.

JavaFX 2.0 BeJUG conference

History and status

The presentation started with a quick history of Java. How it started as a desktop application programming language. How that rich client facet of Java almost disappeared completely behind the Java EE web application years ago for economic reasons. And then how it may come back in a near future with frameworks like JavaFX 2.

But what is JavaFX 2? JavaFX 2 is “a modern Java environment designed to provide a lightweight, hardware-accelerated UI platform that meets tomorrow’s needs”. In this assertion, every word is important. And if JavaFX 2 may address tomorrow’s needs, today, it is still a work in progress. Only the Windows platform has attained the General Availability stage. Mac OS X implementation should get out of the dark in the next months and you can download a quite useable Linux implementation from OpenJDK sources. But unfortunately, iOS and Android implementations of JavaFX are not yet available. This is really unfortunate.

It is foreseen that JavaFX will replace Swing and AWT as a standard graphic library starting with Java 8. But that won’t happen before Java FX becomes JavaFX 3.

So, pragmatically speaking, JavaFX 2 must be considered as experimental at the moment (and that’s also what the demo done during the second part of the presentation confirms). It is the advice of the first presenter too : “wait until Java 8 and JavaFX 3 before reaching production with a JavaFX application”.


There are 2 APIs to JavaFX 2: the FXML API which is a declarative XML interface; the second API is a Java API similar to the Swing interface.

A big difference from the former Swing API is the ability to render HTML content through the use of the WebView component. This makes it possible to enrich a classical web application with behaviours (such as Near Field Communication or eID reader integration) that are only possible in a rich client.

The first part of the presentation is ended by a Hello World demo which looked to me a lot like the tutorials I’ve made with Swing. At least, JavaFX looks more polished than its predecessor and simple things seem simple to program.

Real-World demo

The real-world demo features healthcare software to manage patient’s dossier and made by the HealthConnect firm.

A lot of CSS was used to style UI controls. The CSS properties are proprietary to JavaFX (they all begin with -fx- …) but look similar to HTML properties.

The accent is put on the calendar and dropup (with auto-complete) controls developed and heavily customized by HealthConnect.

It is also put on the observer pattern which allows binding a UI control to a model property. Once done, every change on one of the UI or the model is reflected instantly on the other. Unfortunately, to achieve this, JavaFX 2 has created its own JavaBean-like API with, for example, SimpleObjectProperty and ObservableList.

There also exists a Task API to ease concurrency management while running service callbacks only in the main UI thread.

Here are the various lessons learned from the development of the application:

What has been easier with JavaFX 2?

  • great look and feel
  • customization thanks to CSS
  • the binding between the model and the view thanks to the observer pattern

What has been more difficult with JavaFX 2?

  • hard to change the default behaviour of controls
  • fighting against the JavaFX rules of engagement leads to weird results (but which framework doesn’t behave like that?)

Here are now some resources mentioned during the presentation:

I hope you enjoyed this third article. See you soon for the next one.