Grails/Groovy for the frontend
When I started building my service, I knew I would have to choose a technology for the frontend. One of the big challenge is that there is a big number of available frameworks and languages to choose from: servlets + jsps, spring mvc, PHP, ruby on rails, grails, and many more including as a last resort to build my own. Each solution has their advantages and drawbacks. I think one of the big issue I am seeing in the choice is that it is hard (or at least costly) to switch from one to another once the choice has been made. Even if they are close to each other in concepts they each have their own ways of doing things (including the language, like PHP vs Grails/Groovy...). Also, depending on the technology, you get more or less: for example, Grails is much more than a UI framework with its powerful ORM layer (Object Relational Mapping).
In the end, making the decision was not too difficult as I was able to come up with some boundaries and constraints:
- Ability to talk to java back-end services (with extra bonus if I could collocate them at will in the same process).
- Fast development lifecycle
- Simplified access to the database
PHP has become very popular lately and although I was not afraid of learning a new language (after all, it was one of the goals of this project), it did not feel like it was satisfying all my constraints (note that I may be wrong due to my lack of deep knowledge of this technology). Rapidly, I narrowed it down to ROR (Ruby on Rails) and Grails. I believe both frameworks are very similar in concept, both offering very fast turnaround, scaffolding, ORM layer...
My final decision to use Grails was motivated by the fact that the language used is Groovy and Groovy is java! So I knew I would not have too much trouble learning it and more importantly integrating with any java code or backend server would be a breeze. To bootstrap the process I read the two excellent books (which I strongly recommend): Groovy in Action and The Definitive Guide to Grails, Second Edition
.
After using Grails for several months, I am pretty happy with the choice. I have not used the ORM capabilities yet but I was able to successfully deploy my java backend services in the same VM (by wiring them directly in spring): the autowiring capabilities into the bootstrap sequence and the controllers made it a breeze. Working with controllers and gsp pages (equivalent to jsp for Grails) has been relatively painless and I certainly appreciate the ability to simply modify the source code and see the changes right away when refreshing the page in the browser (which satisfies the fast development lifecycle constraint). I really like the extensibility of the gsp layer by being able to write your own tag library (you can check my post about grails tag libraries): it is pretty straightforward and very powerful.
But in the end, the reason why I am really glad I made this choice is actually the Groovy language. I did not know it beforehand and I am very excited to add it to my toolbox. As I was mentioning before, Groovy is java. Groovy gets compiled into java and integrates seamlessly with it, which is huge for someone who knows java quite well. What it brings to java is big and when I go back to java, I always wish some of the features would be native to java: I love closures, duck typing and builders to name a few. Groovy allows you to be strongly typed if you want to, but unlike pure java, it is not enforced. There is always a battle between the two camps, but I prefer to be neutral and use whichever makes more sense in what I am currently working on.
// small example showing how concise and efficient the syntax can be
// duck typing in action:
// * this method will work on any object which has a method sort(Closure closure)
// * as long as the elements in the list knows how to handle '.name' it will work
// as well (which covers, properties, getters, and even maps (since map["name"]
// can be written map.name!))
static def sortByName(list)
{
list.sort() { e1, e2 ->
return e1.name.compareTo(e2.name)
}
}
Since my initial exposure to Groovy through Grails, I have actually started using it in other areas:
- For 'shell' scripts (which I used to write in bash, perl or python): you get all the power of java and groovy for writing small scripts so no need to switch to a totally different syntax.
- For testing: this is definitely an area where I really don't see any advantage in writing my unit tests in java anymore even if what I am testing is my java code. It is less code to write and it is more readable.
OSGi at LinkedIn (EclipseCon 2009)
My presentation about Building LinkedIn’s Next Generation Architecture with OSGi is live on the EclipseCon web site (slides + audio). Here is the abstract:
Over the course of the last 5 years, LinkedIn has been built using relatively simple technologies: front end web applications (tomcat/servlet/jsp), backend services (jetty/spring remoting), databases, replication, jms. Although the web site was scaling adequately, LinkedIn had some big challenges to overcome: In March of 2008, a group of Senior Engineers started a project to explore the best available technologies which could help in building the next generation of the architecture that would address those challenges. The new architecture involved using OSGI/Spring DM as the foundation because it had the right properties we were interested in. The code was migrated to a more modular paradigm using binary consumption. This session will demonstrate how we integrated OSGi, the pros and cons of the changes, the pain points as well as the migration strategy.
Improving performances of a Lucene Search
Lucene is a popular java based text search engine. You add documents to the index using an IndexWriter and then you can search the index using an IndexSearcher. In order to search, the most flexible api is to use the callback api:
indexSearcher.search(query, new new HitCollector() {
public void collect(int docID, float score) {
// do whatever you want...
}});
For every document which matches the query, Lucene calls the hit collector with the document id of the match as well as the score. This document id is internal to Lucene and cannot be relied upon as it can (and will) change (for example when optimizing the index). The usual practice is to add a field to the document that you index which contains an id which has meaning in your application:
Document doc = new Document();
doc.add(new Field("ID", String.valueOf(myID),
Field.Store.YES, Field.Index.NOT_ANALYZED));
This is useful as well when you want to update/delete the document from the index:
indexWriter.deleteDocuments(new Term("ID", String.valueOf(myID)));
In the callback loop, you can then retrieve your id from the Lucene document id by doing indexSearcher.doc(docID) which returns a Document from which you can simply extract your previously stored id.
This works fine, but is relatively expensive. Indeed, Lucene is very good at caching the index in memory but the problem is since the document is not part of the cache, then it requires a disk access. Depending on how many documents are matching the query it can have some serious implication on the performance.
When I mentioned that you cannot rely on the Lucene id because it changes, it is true. Nonetheless, while the index searcher is opened and until you close it, this id will not change. We can use this property to add some caching which will improve the performances quite a bit. The idea is that when you open the searcher, you simply 'read' and cache all your ids in memory (and you discard the cache when you close it):
String[] myCache = FieldCache.DEFAULT.getStrings(searcher, "ID"); // each entry in the cache is simply the doc id from Lucene!
To make things nicer, I hid all of this under some apis and created my own wrapper:
// a hit collector with userData
public interface LuceneHitCollector<T> {
void collect(int doc, float score, T userData);
}
// wraps a lucene searcher to use the new hit collector
public class LuceneIndexSearcherImpl<T> implements LuceneIndexSearcher<T>
{
private final IndexSearcher _indexSearcher;
private final T[] _userData;
public LuceneIndexSearcherImpl(IndexSearcher indexSearcher, T[] userData) {
_indexSearcher = indexSearcher;
_userData = userData;
}
public LuceneHitCollector<T> search(Query query, final LuceneHitCollector<T> collector)
throws IOException {
_indexSearcher.search(query, new HitCollector() {
public void collect(int doc, float score) {
collector.collect(doc, score, _userData[doc]);
}});
return collector;
}
}
The performance improvements are quite dramatic: a query that used to take around 350ms is now taking about 14ms! Pretty nice. Of course this will work well if you open your searcher and keep it open for several queries which is the case of my application. This technique requires some extra memory but if you can afford it, it is totally worth it.
Note that the api I created is using generics: I wanted to be able to use the payload feature if I need later on to store more than the id. For example if I wanted to store an id and a timestamp, I could create a small serializable object and store it (serialized) as a byte array in the payload of the field. When I open the searcher I could read all the payloads and deserialize them in an array of the correct object type (instead of an array of Strings like in the example). To create an array of the proper size you can simply use the searcher.maxDoc() api. The code to use the payload feature is a little cumbersome/complicated and would require too much code to demonstrate in this blog.
This post is presenting one solution to improve the performances of a Lucene search but there are many other techniques. It definitely works if you have a little extra memory to spare. I wanted to thank the LinkedIn search team for the inspiration!
git for source control management
When I started working on my project, I needed to setup an scm (source control management). One of the goal of the project was to look at new technologies and trends and see how well they can solve some issues I have encountered throughout my career. I am familiar with 2 popular scms: cvs and svn.
At LinkedIn, we started with cvs. Although being widely popular, it has several shortcomings, mainly non transactional commits and very heavy branching / tagging operations. Transactional commit is quite important as it insures that whenever you checkout code, you will always get a consistent view and not a partial commit that somebody else is currently checking in (or even worse if two people are checking in at the same time). "Consistent" is obviously from the point of view of the scm, as a developer may still check in an inconsistent set of files by forgetting to check in one of them for example. cvs fails in that regard as there is no consistency guaranteed. Branching and tagging in cvs is painful because it happens at the file level, so the bigger the project is, the longer it takes.
With a rapidly growing project and team, it was becoming unmanageable with cvs. We then moved to svn which was solving the two main issues I was talking about: with svn, commits are transactional: every time you commit you actually have an ever increasing commit number which refers to the entire commit. Either the entire commit will go through or not and another developer will never see an inconsistent view. Branching and tagging are very lightweight operations in svn as it does not copy the entire tree.
After using svn for a while, although being an improvement over cvs I am still pretty unhappy with it as there are several big issues like loosing the history of commits (mild problem) to loosing changes when files are moved around on a different branch (really bad). Also, although branching is much easier/faster than cvs, the syntax to merge is quite complicated and error prone by having to constantly figure out the correct commit numbers to use. To be fair, some of those issues are being addressed in more recent versions (although it is still not there yet).
In my quest for something better, I started hunting around and clearly the new trend is Distributed SCM. Open source projects are the perfect example of highly distributed development environments and having a centralized scm (like cvs or svn) is showing its limitations. Hence the need for an scm that would handle this kind of projects. I believe that the most prominent distributed scms today are git and mercurial. Both projects were started around the same time as a free/open source response to BitKeeper which was not going to be free anymore. git was created by Linus Torvald for handling the Linux kernel development. mercurial was created by Matt Mackall.
On the paper, despite some minor differences the concepts are essentially the same. Not having any preconceived ideas with either of them, I decided to use git for my own project mainly because of the IDE support: it has very good support in Intellij IDEA. I am by no mean an expert in git but I can share my experience after using it for several months now.
svn: creating branches, switching between them, merging,... is all a breeze. Nonetheless, there is something that takes time to get used to: the 'staging' area which is definitely not very intuitive at first especially coming from a different model. Let's take an example: you start modifying a file then you want to commit the changes. You must first move the file in the staging area by issuing 'git add <file>'. If before committing, you modify the file again, you must add it to the staging area again. In the IDE, you don't have to worry about any of this which makes it very transparent. On the command line, you just need to be a little bit more careful, but it has become a habit to always issue a 'git status' command which tells you the state of your changes and which ones need to be moved in the staging area.
svn for example, I would have to be able to somehow connect to my desktop to be able to continue working... But I don't have any of this issue with git at all: I simply cloned my repository on my laptop before leaving which you can do over ssh very easily as it is built-in: 'git clone user@machine:/path/to/repo'. I can then work on my laptop during the entire time I am away, creating branches, committing as many time as I want. When I come back I will simply issue a 'git push' command (yes that is it... I don't even have to tell it where to push to!!) to move back all the changes I have made onto my desktop with full history! Really neat! By the way since it is all local, it is also perfect for plane work!
git to facilitate the creation of open source projects. By lowering the barriers to be able to contribute to open source projects, I think it has great potential to become quite a nice platform.
I have not tried mercurial at all so I do not have anything on my own I can share. I have heard people complaining about the fact that mercurial does not allow you to rewrite the history where git does and depending on which camp you are, it may be good or bad. I don't mind that git allows you to do that and if you don't like it then you just don't use the feature.
As I was mentioning previously, distributed scm is the new trend. After trying it for myself for a while I have become very excited about it and I really don't believe it is just a fad. My personal opinion is that it is the future. Even for non distributed projects (like mine at the moment), the benefits are obvious. I do not regret the decision I have made as it allowed me to see the potential of this emerging technology! Distributed scm is a brilliant idea and I invite you to try it out: once you make the switch you will not want to go back.
-
Search
-
Feed
-
Links
-
Recommendations
-
Recent Entries
- ZooKeeper loss of events problem... fixed
- Indexing android 'froyo' javadoc in kiwidoc
- Connecting to a local vm using jmx knowing the process id.
- Configuring apache -> tomcat load balancer
- pongasoft presents... kiwidoc
- CSS for the UI design
- The real cost of high-speed internet in the US
- Grails/Groovy for the frontend
- OSGi at LinkedIn (EclipseCon 2009)
- Improving performances of a Lucene Search
- git for source control management
- Version Management and OSGi
- Grails - Invoking a tag lib from another tag lib
- Starting from scratch... domain name and web hosting
- Grails - Proper shutdown in dev mode
- pongasoft.... a new adventure
- Welcome to the software cookbook!
-
Calendar
