Test Driven Development and Java ME
At the very beginning of my career as a software developer, I was hired to work at a pioneer company on the field of wireless software. There I learned Java ME (J2ME at that time) almost at the same time I learned Java itself. As a newbie programmer, I was not aware of the benefits of a good code design, neither familiar with methodologies like TDD, DDD, etc.
After sometime, I left the company and moved from wireless devices to server side, where I was presented to Hibernate, Spring, EJB and all Design Patterns and development methodologies surrounding these technologies. A long time passed and now I am again working at a company that breaths wireless market. Although most of the projects I work have mobile phones as target access point, I spend most of the time building the backend to provide content to these clients. Anyway, now and then I find myself doing some maintenance or adding a new feature to a Java ME application and when this happens, I feel like having a déjà vu.
Even the best Java ME developers that I had the pleasure to work with still do not practice Test Driven Development and believe Dependency Injection is only useful for server side environments. Most of the applications I have seen cannot be easily tested due to large dependency on Java ME libraries and external resources. When I try or suggest refactoring I normally listen something like “it’ll complicate things”, “you’re too much server-oriented, in the mobile world we do not need so many interfaces”. I respectfully disagree.
Dependency Injection (DI) is the best friend of Unit Test. Moreover, every time you instantiate an environment-specific object into a domain class, a Ferrari crashes in Italy. Are you willing to give up automated testing, destroy a friendship and a Ferrari? Using DI you can eliminate dependency from Java ME classes, enable your code to be unit tested and save the World (or at least some time).
I will go deeper in this topic at the JavaOne 2010, where I will show some anti-patterns I’ve seen and discuss how to avoid them.
Design Pattern for processing a huge XML file: The Solution
This post is the sequence of the last one I wrote describing the problem I recently faced when needed to parse and process a big and complex XML file.
After playing around with the conventional solutions, I was not convinced to leave xPath/DOM code legibility for an effective memory consumption result.
To understand my solution, it’s important to analyze the data I’m working with. Although the schema is complex and the file contains lots of data, the root tag represents a list of entities (table records) and there is no dependency between nodes. They can (and really might) be processed in a parallel way.
My solution uses a hybrid producer-consumer implementation, where a reader class loads XML contents to memory and dispatches small segments to a parser responsible for processing each segment as if it was a complete file, but without memory consumption concerns.
There are three main steps in the entire parsing process as follows:
Loading
First, I created a class named Reader, which the only purpose is to load the contents of a given XML file to the memory and dispatch it for processing. This class contains a buffer size based in the number of loaded entities. In other words, if the XML file contains a root tag named Cars, with a list of Car nodes, the buffer will be counting occurrences of </Car>. When a given number of entities is loaded, data is dispatched for processing and the buffer is reset.
Dispatching
Instead of dispatching data directly to the Parser, the Reader object has an implementation of the ParserDispatcher and uses it for this job. The idea behind this is to abstract the execution environment from the rest of the code. While a valid ParserDispatcher implementation for server-side environment would be posting the data to a JMS Queue, my command line desktop application uses an ExecutorService for the same purpose.
Parsing
Parsing process itself doesn’t have any novelty besides the fact that the huge XML file, after broken in small blocks, can be parsed with xPath/DOM without compromising memory consumption or performance. The Parser class is a common XML parser, unaware of prior stages the data was submitted to, it is able to parse any given InputStream since it points to an XML content compliant with its Schema and small enough to be completely represented by a DOM structure in memory. After each entity is parsed, a list of listeners is notified. These listeners can persist, log, count, create reports, etc.
A client application would run by calling the following lines:
Parser parser = new Parser();
parser.addListener(new DebugListener());
Reader xmlReader = new Reader(new NewThreadDispatcher(parser));
xmlReader.read(new FileInputStream("file.xml"));
My friend Paulo Jeronimo, commented in my last post suggesting me to use StAX. Being a pull parser over stream, StAX tries to bring the best of both worlds but in my opinion, a code using StAX is not as legible as it would be using xPath/DOM, that’s why I decided to create my own design.
Although this is neither the most performatic nor the simplest solution, I believe this brings a good balance between performance, memory consumption and code maintainability. Moreover, this pattern can be extended for other file types.
Design Pattern for processing a huge XML file: The Problem
Some days ago I started working in a project that requires parsing and storing information contained in huge files of different formats. These files are sent by partners of our client and represent data contained in their databases. Sometimes this data is consistent and useful for our system, other times it’s just crap. As we do not have access to their databases, it’s necessary to parse and store in a database and then query this data in order to understand how consistent and complete it is.
Last week my manager asked me to parse the content of an XML file with more than 500MB. The result of this activity would give us information about the quality of the data that that partner could provide us and then we would be able to decide if the process of parsing and storing such schema would be permanently added to the system or just thrown away.
Although the system runs in a Java EE container, for a single process like this I consider much easier to create a Java SE application that receives a filename as parameter, parses and stores it. On the other hand, if the result shows that the partner’s database is consistent and useful enough, this is not a single process anymore and this code must be added to the project. Given that, it was strongly recommended to implement the code in a way that it could be easily refactored from desktop to server environment.
As I said before, the file size was higher than 500MB and this was just a test file, next ones (if exist) might be bigger one gigabyte. Loading so large content to the heap wouldn’t be possible in production. Once DOM was not an option, XPath was also discarded and SAX became the only option. The problem now was that the schema is very complex and the code necessary to parse it using SAX would easily become too messy to be maintained.
Well, that's enough for today! Now I'll let you think about this problem and in a few days I'll describe my solution here.
UPDATE: Post with my solution for this problem can be found here.
See you soon!

LinkedIn
Delicious
Google Reader
Facebook
Google Profile
Twitter
Last.fm
FriendFeed
YouTube
Orkut
Picasa Web Album
Flickr