Devoxx09 presentation on Internal DSL

by Daniel Gazineu on April 7th, 2011

Before it completes one year since my last blog post I decided to update this stream with some old (yet never blogged) content.

Here are the slides of my presentation about Internal DSL at Devoxx09.

Share

Design Pattern for processing a huge XML file: The Solution

by on April 27th, 2010

This post is the sequence of the last one I wrote describing the problem I recently faced when needed to parse and process a big and complex XML file.

After playing around with the conventional solutions, I was not convinced to leave xPath/DOM code legibility for an effective memory consumption result.

To understand my solution, it’s important to analyze the data I’m working with. Although the schema is complex and the file contains lots of data, the root tag represents a list of entities (table records) and there is no dependency between nodes. They can (and really might) be processed in a parallel way.

My solution uses a hybrid producer-consumer implementation, where a reader class loads XML contents to memory and dispatches small segments to a parser responsible for processing each segment as if it was a complete file, but without memory consumption concerns.

There are three main steps in the entire parsing process as follows:

Loading

First, I created a class named Reader, which the only purpose is to load the contents of a given XML file to the memory and dispatch it for processing. This class contains a buffer size based in the number of loaded entities. In other words, if the XML file contains a root tag named Cars, with a list of Car nodes, the buffer will be counting occurrences of </Car>. When a given number of entities is loaded, data is dispatched for processing and the buffer is reset.

Dispatching

Instead of dispatching data directly to the Parser, the Reader object has an implementation of the ParserDispatcher and uses it for this job. The idea behind this is to abstract the execution environment from the rest of the code. While a valid ParserDispatcher implementation for server-side environment would be posting the data to a JMS Queue, my command line desktop application uses an ExecutorService for the same purpose.

Parsing

Parsing process itself doesn’t have any novelty besides the fact that the huge XML file, after broken in small blocks, can be parsed with xPath/DOM without compromising memory consumption or performance. The Parser class is a common XML parser, unaware of prior stages the data was submitted to, it is able to parse any given InputStream since it points to an XML content compliant with its Schema and small enough to be completely represented by a DOM structure in memory. After each entity is parsed, a list of listeners is notified. These listeners can persist, log, count, create reports, etc.

A client application would run by calling the following lines:


Parser parser = new Parser();
parser.addListener(new DebugListener());
Reader xmlReader = new Reader(new NewThreadDispatcher(parser));
xmlReader.read(new FileInputStream("file.xml"));

My friend Paulo Jeronimo, commented in my last post suggesting me to use StAX. Being a pull parser over stream, StAX tries to bring the best of both worlds but in my opinion, a code using StAX is not as legible as it would be using xPath/DOM, that’s why I decided to create my own design.

Although this is neither the most performatic nor the simplest solution, I believe this brings a good balance between performance, memory consumption and code maintainability. Moreover, this pattern can be extended for other file types.

Share

Design Pattern for processing a huge XML file: The Problem

by on April 14th, 2010

Some days ago I started working in a project that requires parsing and storing information contained in huge files of different formats. These files are sent by partners of our client and represent data contained in their databases. Sometimes this data is consistent and useful for our system, other times it’s just crap. As we do not have access to their databases, it’s necessary to parse and store in a database and then query this data in order to understand how consistent and complete it is.

Last week my manager asked me to parse the content of an XML file with more than 500MB. The result of this activity would give us information about the quality of the data that that partner could provide us and then we would be able to decide if the process of parsing and storing such schema would be permanently added to the system or just thrown away.

Although the system runs in a Java EE container, for a single process like this I consider much easier to create a Java SE application that receives a filename as parameter, parses and stores it. On the other hand, if the result shows that the partner’s database is consistent and useful enough, this is not a single process anymore and this code must be added to the project. Given that, it was strongly recommended to implement the code in a way that it could be easily refactored from desktop to server environment.

As I said before, the file size was higher than 500MB and this was just a test file, next ones (if exist) might be bigger one gigabyte. Loading so large content to the heap wouldn’t be possible in production. Once DOM was not an option, XPath was also discarded and SAX became the only option. The problem now was that the schema is very complex and the code necessary to parse it using SAX would easily become too messy to be maintained.

Well, that’s enough for today! Now I’ll let you think about this problem and in a few days I’ll describe my solution here.

UPDATE: Post with my solution for this problem can be found here.

See you soon!

Share

Talk accepted at Devoxx09

by on October 16th, 2009

Continuing the sequence of news, I’m glad to announce that my Quickie Talk “Fluent Validation Framework – a DSL for validations using Fluent Interfaces” was accepted at Devoxx09.

I’m very happy with the opportunity to contribute in such important event.

I’ll have only 15 minutes there (that’s why it’s called “quickie”), but will try to bring as much content as I can about Fluent Interfaces development and issues. The talk will be based on my framework for validations, which I think, was a nice exercise inside Fluent Interfaces development world.

The abstract can be found here.

See you next month at Devoxx09!

Share

A DSL for validations using fluent interfaces

by on August 9th, 2009

As a software developer, I’m particularly interested in Domain Driven Design (DDD) and Domain Specific Languages (DSL). In the last few weeks I’ve been flirting with Fluent Interfaces and trying to get used to its techniques in order to apply it to build better DSL for my domains.

During my studies, I decided to build a small validation framework with a fluent interface to put in practice what I’m reading. It’s far from being ready for professional usage but its development is being a good exercise with fluent interfaces and DSL.

What I was looking for in the beginning of its creation was something to enable the developer to make validations in a more human friendly way.

Let’s say you have an object called ‘myObject’. How would you ask to a validation framework to ensure it is not null?
Well, I made this question to myself and one of the possible answers was:

“Ensure myObject is not null.”

This part was pretty easy to implement and the first draft of the Fluent Validation Framework (as I call it) looked like this:

ensure(myObject).isNotNull();

But at the same time that I would like to build something as simple as:

ensure(myObject).isEqualsTo(myOtherObject);

I also wanted to have more control of the result of the validation. Something like:

ensure(myObject).isNotNull().otherwise(“myObject cannot be null);

Or even better:

ensure(myObject).isNotNull().otherwise()
                            .throwThis(new IllegalStateException());

In both examples above we can see a problem inherent to the Method Chaining technique. It’s not possible to know during the execution of isNotNull() method that there is another method to be executed after it, this way, using this approach, it would be necessary to add an execution command at the end of the chain:

ensure(myObject).isNotNull().otherwise()
                            .throwThis(new IllegalStateException())
                            .now();

This works, but lets our DSL too extensive. A simple null verification would be:

 ensure(myObject).isNotNull().now();

Moreover, I didn’t like this now() method in the end of the sentence, it’s not so fluent and intuitive.
This way, I decided to move to the Nested Function approach. After some rethinking, the API usage for the same situations listed above was:

ensure(myObject).is(notNull(), otherwise(
                                   throwThis(new IllegalStateException()));

Also, the short validation would work with few parameters:

ensure(myObject).is(notNull());

This brings a good side benefit: passing the validation as a parameter to is(Condition) method enables the developer to create his own validation conditions and extend the framework to work with his domain. In the other hand, I added more necessary static imports to the code, otherwise, the code would be:

Validator.ensure(myObject).is(Conditions.notNull(),
                              Actions.otherwise(
                                  Actions.throwThis(
                                      new IllegalStateException()));

Too much code for a simple if/else block. Moreover, this verbose code doesn’t seem to be so human oriented.

After thinking for a while, I fingered you that the technology problem I was facing could be solved by a linguistic review of the solution. It’s right that an easy way to ask a framework to validate something is saying:

Ensure myObject is not null, otherwise, throw this exception.

But it can also be said:

Analyze myObject and throw this exception if it’s null.

This way, I achieved the following result:

ensure(myObject).isNotNull();

or:

analyse(myObject).and().throwThis(new IllegalStateException()).ifNull();

Moreover, this solution uses internally the methods is(Condition), ifIsNot(Condition) and and(Action), what enables developer to create his own conditions and actions:

ensure(myObject).is(inValidState());

or:

analyse(myObject).and(logAnError()).ifIsNot(inValidState());

Assuming that methods inValidState() and logAnError() were created by the developer and return a Condition and an Action respectively.

My idea is to improve this framework adding specific validations for Strings, Numbers, Arrays and Collections, letting specific domain validations to be built inside their domains.

Soon I’ll be posting and discussing this code here, but now it’s time to sleep!

Share

Exposing bitwise operations in a fluent interface

by on July 20th, 2009

Some days ago I had to develop a class that would represent an event and should contain a weekly recurrence attribute. First idea that came to mind was to use seven lower representative bits of a byte as days of a week and flagging them on and off using bitwise operators. That’s a pretty common resolution for such kind of problem and what I want to share here is not this solution, but the interface I defined to expose it.
I tried to define a fluent interface for it and right now, that’s the way a client uses this class:

WeeklyRecurrence rec = new WeeklyRecurrence().repeatOn(EVERY_DAY).but(TUESDAY,THURSDAY);
assertTrue(rec.occursOn(WEDNESDAY));
assertFalse(rec.doesNotOccurOn(FRIDAY));

Moreover, I created an escape() method to point days where the recurrence doesn’t occur:

rec = rec.escape(WEDNESDAY).but(TUESDAY);
assertTrue(rec.doesNotOccurOn(WEDNESDAY));
assertTrue(rec.occursOn(TUESDAY));

As you can see, but() method is contextual and works adding or removing days from a recurrence, always in the opposite way of the last operation.
You may also notice that WeeklyRecurrence is immutable, in order to avoid complications with changed states. This way, the following is correct:

WeeklyRecurrence r1 = new WeeklyRecurrence();
WeeklyRecurrence r2 = r1.repeatOn(EVERY_DAY);
assertFalse(r1.equals(r2));
 
r1 = new WeeklyRecurrence().repeatOn(MONDAY);
r2 = new WeeklyRecurrence().repeatOn(MONDAY);
assertTrue(r1.equals(r2));

The following box contains the complete code for WeeklyRecurrence. The idea of this post is to show that even “low-level” solutions can expose fluent and clean interfaces to the other application layers.

/**
 * Represents recurrence of something in a week.
 */
public class WeeklyRecurrence {
 
	// the value
	private byte recurrence;
 
	/**
	 * Utility method to merge n days with an original recurrence value
	 */
	private static byte merge(byte original, Day... days) {
		for (Day day : days) {
			original |= day.value;
		}
		return original;
	}
 
	/**
	 * Utility method to remove n days from an original recurrence
	 */
	private static byte diff(byte original, Day... days) {
		byte escape = merge((byte) 0, days);
		return (byte) (original &amp; (Day.EVERY_DAY.value - escape));
	}
 
	/**
	 * Enum that represents days of a week
	 */
	public enum Day {
 
		// 10000000
		SUNDAY((byte) 1),
 
		// 10000000
		MONDAY((byte) 2),
 
		// 01000000
		TUESDAY((byte) 4),
 
		// 00010000
		WEDNESDAY((byte) 8),
 
		// 00001000
		THURSDAY((byte) 16),
 
		// 00000100
		FRIDAY((byte) 32),
 
		// 00000010
		SATURDAY((byte) 64),
 
		// 11111110
		EVERY_DAY((byte) 127);
 
		private final byte value;
 
		private Day(byte value) {
			this.value = value;
		}
	}
 
	/**
	 * Just for internal use, initializes a weeklyrecurrence with a byte value
	 */
	private WeeklyRecurrence(byte recurrence) {
		this.recurrence = recurrence;
	}
 
	/**
	 * Initializes a <code>WeeklyRecurrence</code> with no recurrence.
	 */
	public WeeklyRecurrence() {
		this((byte) 0);
	}
 
	/**
	 * Discovers if given recurrence is set. It returns <code>true</code> even
	 * if there are more days in the current recurrence.
	 */
	private boolean isSet(byte otherRecurrence) {
		return (recurrence &amp; otherRecurrence) == otherRecurrence;
	}
 
	/**
	 * Creates a new <code>WeeklyRecurrence</code> merging given days with the
	 * one in the current recurrence instance.
	 */
	public WeeklyRecurrence repeatOn(Day... days) {
		byte newRecurr = merge(this.recurrence, days);
 
		return new WeeklyRecurrence(newRecurr) {
			public WeeklyRecurrence but(Day... days) {
				return escape(days);
			}
		};
	}
 
	/**
	 * Creates a new <code>WeeklyRecurrence</code> instance removing given days
	 * from the current recurrence instance.
	 */
	public WeeklyRecurrence escape(Day... days) {
		byte newRecurr = diff(this.recurrence, days);
		return new WeeklyRecurrence(newRecurr) {
			public WeeklyRecurrence but(Day... days) {
				return repeatOn(days);
			}
		};
	}
 
	/**
	 * Creates a new <code>WeeklyRecurrence</code> including or removing given
	 * days to the current instance, always in the opposite way of the last
	 * operation. If no recurrence was set, it throws an
	 * <code>IllegalStateException</code>
	 */
	public WeeklyRecurrence but(Day... days) {
		throw new IllegalStateException("No recurrence was set yet");
	}
 
	/**
	 * Informs if none of the given days are in the current recurrence.
	 */
	public Boolean doesNotOccurOn(Day... days) {
		for (Day day : days) {
			if (isSet(day.value))
				return false;
		}
		return true;
	}
 
	/**
	 * Informs if every given days are in the current recurrence.
	 */
	public Boolean occursOn(Day... days) {
		byte rec = merge((byte) 0, days);
		return isSet(rec);
	}
 
	@Override
	public int hashCode() {
		final int prime = 31;
		int result = 1;
		result = prime * result + recurrence;
		return result;
	}
 
	@Override
	public boolean equals(Object obj) {
		if (this == obj)
			return true;
		if (obj == null)
			return false;
		if (getClass() != obj.getClass())
			return false;
		WeeklyRecurrence other = (WeeklyRecurrence) obj;
		if (recurrence != other.recurrence)
			return false;
		return true;
	}
}
Share