Connecting to a local vm using jmx knowing the process id.
On the project I am currently working on at LinkedIn, I needed to programatically access the jmx interface of a java VM. The caller is another program written in java/groovy and knows the process id of the VM it wants to talk to. Note that both VMs are running on the same host. jconsole does exactly that so it should be pretty straightforward. In the end, it is not very complicated, but to get to that point it took me several hours of scratching my head and debugging to make it right.
The page Monitoring and Management Using JMX Technology, describes a technique which allows you to attach to another virtual machine using the com.sun.tools.attach.VirtualMachine (which is not part of the standard jdk 1.6 api, but is an internal SUN class so if you use a SUN VM, it is available. It is part of the OpenJDK project).
Extracting the JMXServiceURL (groovy):
private static final String CONNECTOR_ADDRESS =
"com.sun.management.jmxremote.localConnectorAddress";
private JMXServiceURL extractJMXServiceURL(pid)
{
// attach to the target application
com.sun.tools.attach.VirtualMachine vm =
com.sun.tools.attach.VirtualMachine.attach(pid.toString());
try
{
// get the connector address
String connectorAddress =
vm.getAgentProperties().getProperty(CONNECTOR_ADDRESS);
// no connector address, so we start the JMX agent
if (connectorAddress == null) {
String agent = vm.getSystemProperties().getProperty("java.home") +
File.separator + "lib" + File.separator +
"management-agent.jar";
vm.loadAgent(agent);
// agent is started, get the connector address
connectorAddress =
vm.getAgentProperties().getProperty(CONNECTOR_ADDRESS);
}
// establish connection to connector server
return new JMXServiceURL(connectorAddress);
}
finally
{
vm.detach()
}
}
Once you obtain the JMXServiceURL, then you need a reference to the JMXConnector:
def connector = JMXConnectorFactory.connect(url); def connection = connector.getMBeanServerConnection(); // use the connection...When I tried this approach it was working fine on my development environment but when I deployed it on a test machine, I got the following exception:
Caused by: com.sun.tools.attach.AttachNotSupportedException: Unable to open door: target process not responding or HotSpot VM not loaded at sun.tools.attach.SolarisVirtualMachine.(SolarisVirtualMachine.java:68) at sun.tools.attach.SolarisAttachProvider.attachVirtualMachine(SolarisAttachProvider.java:42) at com.sun.tools.attach.VirtualMachine.attach(VirtualMachine.java:195) at com.sun.tools.attach.VirtualMachine$attach.call(Unknown Source)
Quite an unusual error message if you are not familiar with Solaris... The issue that I uncovered here is that the ability to attach to another VM is jdk1.6 only and on the test machine I was trying to connect from a 1.6 VM to a 1.5 VM and that does not work (note that in my case I have no choice and need to run with both VMs).
To fix this issue, I needed a way that would work with both VMs. There is another internal API which can be used to extract the JMXServiceURL: using the class sun.management.ConnectorAddressLink:
private JMXServiceURL extractJMXServiceURL(pid)
{
String serviceURL = null
try
{
serviceURL = sun.management.ConnectorAddressLink.importFrom(pid as int)
}
catch(IOException e)
{
log.warn("Cannot find process ${pid}")
}
if(serviceURL == null)
return null
else
return new JMXServiceURL(serviceURL)
}
Something to keep in mind, is that there is a difference between 1.5 and 1.6:
This solution was working great in my development environment but was failing again in my testing environment. I spend close to 3 hours in trial and error: the issue now was an "IOException: process not found" error when calling the importFrom method. At some point, I realized that when I was using jconsole, it was not listing my java processes and that the jps command was not returning anything either.
I then run my test program using the truss command (Solaris) which logs all the system calls. I then realized that the method is looking for a file called /tmp/hsperfdata_<username>/<pid> (where username is the user executing the unix process). This folder was empty and this is why it was not returning anything. I later on realized that the permissions on the folder were wrong and were preventing the VM to write its pid in it. The frustrating part is that the failure was totally silent and never reported in any log file and there was no way to turn on any debugging level to see the error. If it wasn't for the truss command I am not certain how I could have figured this out since it is totally undocumented and fails silently. Changing the permissions on the folder immediately fixed the problem!
This is a very good demonstration of why the pattern:try
{
// do something which may throw an exception but if it does I will ignore
// and continue
}
catch(Exception e)
{
// ok ignored
}
is a very bad pattern and should be replaced with something like:
try
{
// do something which may throw an exception but if it does I will ignore
// and continue
}
catch(Exception e)
{
// ok ignored
if(log.isDebugEnabled())
log.debug("ignored exception", e)
}
Configuring apache -> tomcat load balancer
Now that kiwidoc has been released, I can share my experience on how I configured the system in 'production'. kiwidoc is hosted at rackspace on 2 machines. A small one for the load balancer (apache web server) and a bigger one for the main application (tomcat). Configuring it was quite a challenge and I just want to share how I did it. Note that the instructions are for Ubuntu 9.0.4 with a stock installation of apache (2.2.11-2ubuntu2.3) and tomcat (6.0.18-0ubuntu6) using the standard apt* commands.
What did I want to achieve ?
My main application is a web application and is deployed in tomcat under [/java]. The load balancer (apache) should be able to direct traffic to multiple instances of tomcat when the need arises. I also wanted http://www.kiwidoc.com/ (in other word [/]) to be redirected to [/java/] which is my main entry point. The catch is that there are some pages that need to be served by apache (like some error pages) and this was not easy to configure.Configuring tomcat (Part I)
On the tomcat side, I setup a new connector for ajp (file /etc/tomcat6/server.xml):
<!-- Define an AJP 1.3 Connector on port 8009 -->
<Connector port="8009" protocol="AJP/1.3" redirectPort="8010"
proxyname="www.kiwidoc.com" proxyPort="80" URIEncoding="UTF-8"/>
I chose ajp because it is supposed to be much faster than standard http. So far the configuration is not too difficult. proxyname and proxyport are used so that the methods ServletRequest.getRemoteHost() and ServletRequest.getRemotePort() return the correct value.
Configuring apache
On the apache side, I added the 4 modules (directory /etc/apache2/mods-enabled):proxy_ajp.load -> ../mods-available/proxy_ajp.load proxy.load -> ../mods-available/proxy.load proxy.conf -> ../mods-available/proxy.conf proxy_balancer.load -> ../mods-available/proxy_balancer.loadThen under /etc/apache2/sites-enabled, I have the following file (which I called 100-lb):
<VirtualHost *:80> ########################## # DocumentRoot DocumentRoot /var/www <Directory /var/www/> Options FollowSymLinks MultiViews AllowOverride None Order allow,deny allow from all </Directory> ########################## # Error handling ErrorLog /var/log/apache2/error.log LogLevel warn CustomLog /var/log/apache2/access.log combined ErrorDocument 503 /errors/error_503.html ErrorDocument 404 /errors/error_404.html ########################## # Proxy ProxyRequests Off <Proxy *> Order deny,allow Allow from all </Proxy> <Proxy balancer://kiwidoc> BalancerMember ajp://123.123.123.123:8009 BalancerMember ajp://123.123.123.124:8009 </Proxy> ProxyPass /errors ! ProxyPass /images ! ProxyPass / balancer://kiwidoc/ </VirtualHost>Let's cover each section:
- The first 2 lines define an exclusion rule: all requests to [/errors] and [/images] will be served by apache and not forwarded (this is required due to the 3rd line).
- The last line send all traffic to [/] to the balancer.
- The static content should be served by apache (which does it very efficiently).
- If all tomcat instances are unreachable, then apache will issue a 503 error code, which gets mapped to [/errors/error_503.html] and without the exclusion rule, it would try to go to tomcat (which we know is unreachable...). This use case happens for example when I need to shutdown the main application for maintenance: you see a nice maintenance page.
Configuring tomcat (Part II)
We are almost there. The issue now is that [/] goes to tomcat which needs to handle it properly. So here is what I did:Under /var/lib/tomcat6/webapps/ROOT (which is what tomcat uses for [/]), I have a mini webapp:
WEB-INF/web.xml
---------------
<web-app xsi:schemaLocation='http://java.sun.com/xml/ns/j2ee
http://java.sun.com/xml/ns/j2ee/web-app_2_4.xsd'
version='2.4'
xmlns='http://java.sun.com/xml/ns/j2ee'
xmlns:xsi='http://www.w3.org/2001/XMLSchema-instance'>
<error-page>
<error-code>404</error-code>
<location>/errors/error_404.html</location>
</error-page>
</web-app>
index.jsp
---------
<% response.sendRedirect("http://www.kiwidoc.com/java/"); %>
errors/error_404.html
Let's cover each section:
Conclusion
First of all, it works, achieves what I described early on and I believe it covers all cases like maintenance mode and 'not found' error pages. Nonetheless I wish I had found a cleaner/simpler way to do that or in other words, to not have to create a ROOT webapp on the tomcat side. The main issue stems from the fact that I was totally unable to express in apache the simple rule: redirect [/] ONLY to [/java/] because using the [/] rule is treated as [/*]. I would be surprised if it was not possible, it is just very hard to find the documentation that explains how to do it.pongasoft presents... kiwidoc
Back in April, I introduced the new adventure I was embarking on with pongasoft. It has been quite a ride but I am very proud to finally be able to release the service: it is called kiwidoc. You can find more information about the service itself on the About page including a description of all the main features that I was able to implement for the beta release. Here is the introduction:
In a few words, kiwidoc can be described as javadoc on steroids. The main goal is to help software developpers quickly find the information about java libraries in a single location:
- proximity search and typeahead allow you to quickly locate what you are looking for.
- IDE-style display shows you the relevant information in a familiar format.
- immediate access to additional information such as library dependencies, manifest, OSGi headers, etc.
- the private view can provide even more details if you need to dig deeper (in order to, for example extend a library or better understand its internals).
CSS for the UI design
In a previous post, I was talking about using Grails/Groovy to build the front-end. The final rendering needs to be made pretty/attractive. The old days of using tables and spacer gif to have the page looks good have thankfully been replaced with CSS. Not being a web designer myself, I had some very rudimentary notions of what CSS is. I used it a little to build/tweak the blog you are reading, but I must admit I was kind of flying blind as I did not understand it fully nor did I know enough to be efficient. The result although looking nice (at least to my tastes :)), certainly left to be desired and there are many things I wanted to do that I could not achieve (just check how the right sidebar seems to be detached and disappears if the window gets too small...). I hunted for books about CSS and ended up buying 2:
The real cost of high-speed internet in the US
I just came back from a trip to France to visit my family and I helped my dad with his internet connection, so I got to spend quite some time looking at what is available and at what price. As I arrived in the US, I found out that my high speed (Comcast) internet bill has gone up. While closely looking at the bill, it is not that they raised their prices, but that my promotional cycle is unfortunately over.
I dug up a little more to try to see if I could get a better deal. The thing that really annoys me is how Comcast advertises in big letter how High Speed internet is 'only' $19.99 a month. Of course it is a promotional offer and if you look closely at the terms and condition, there is already a catch: you must subscribe to another of their service to even get the promotion. Their cheapest service is basic cable TV which adds up $15.39 a month for a total of $35.38 (a 77% increase from the big advertised price!). And when the promotion ends, the rate goes up to $58.34 (close to a 3x the price advertised in big letter). In my case I don't even care about cable TV as I use Dish, but I have to pay for it. Note that if you do not want basic cable, then high speed internet is $58.95 even more expensive. I am sure the practice is legal, but I think it is a big scam to advertise a prize that will triple in 6 months...
So is there a better deal ? Well not really. I looked around and the only real other choice in my area is DSL for over $45 a month with speeds 10x slower. The second thing that really annoys me is that I am not living in a rural US area! I am in the middle of the Silicon Valley (Mountain View), the heart of most of the big internet services!
As I was mentioning, I ended up doing some research in France and you can get TV + Internet (20M) + Unlimited phone (including to other countries) for about 30 euros a month (roughly $42). And I know that it is still expensive as some other north european countries offer high speed internet for $7 a month.
Why is it so bad in the US ? The reason that I can think of is monopoly. The reality is that besides internet cable there is just no match in my area in terms of speed. Since cable is owned by Comcast then I just don't have a choice, if I want the speed I just have to cough up the cash (and even if I don't want the speed, DSL alternatives are also extremely expensive for what they offer). In France, France Telecom used to have a monopoly in terms of phone lines, but there has been laws that forced the company to loan the phone lines to competitors and this has driven competition like crazy: there are many offers to choose from and that has driven the prices down. If you read french, it is called dégroupage: "La loi oblige (en France, et dans l'Union européenne) l'opérateur historique à fournir un accès dégroupé à la boucle locale aux opérateurs alternatifs. [...] Le dégroupage rend, en particulier, possible une concurrence réelle dans les offres commerciales d'ADSL et permet ainsi une baisse des tarifs de détail."
In the end I think it is a pretty grim reality because roughly $60 a month for high speed internet is seriously limiting penetration. This has a lot of impacts on companies like YouTube and others which rely on fast internet access to provide their service and in the end it has impact on the economy. Some companies (ex: Comcast) are benefiting from this monopoly but the real cost is the dive in worldwide ranking (from #1 to #15) in broadband use with the long term consequences that it will have for the country on a global scale. My only hope today is that the new administration will finally wake up and do something about it.
Grails/Groovy for the frontend
When I started building my service, I knew I would have to choose a technology for the frontend. One of the big challenge is that there is a big number of available frameworks and languages to choose from: servlets + jsps, spring mvc, PHP, ruby on rails, grails, and many more including as a last resort to build my own. Each solution has their advantages and drawbacks. I think one of the big issue I am seeing in the choice is that it is hard (or at least costly) to switch from one to another once the choice has been made. Even if they are close to each other in concepts they each have their own ways of doing things (including the language, like PHP vs Grails/Groovy...). Also, depending on the technology, you get more or less: for example, Grails is much more than a UI framework with its powerful ORM layer (Object Relational Mapping).
In the end, making the decision was not too difficult as I was able to come up with some boundaries and constraints:
- Ability to talk to java back-end services (with extra bonus if I could collocate them at will in the same process).
- Fast development lifecycle
- Simplified access to the database
PHP has become very popular lately and although I was not afraid of learning a new language (after all, it was one of the goals of this project), it did not feel like it was satisfying all my constraints (note that I may be wrong due to my lack of deep knowledge of this technology). Rapidly, I narrowed it down to ROR (Ruby on Rails) and Grails. I believe both frameworks are very similar in concept, both offering very fast turnaround, scaffolding, ORM layer...
My final decision to use Grails was motivated by the fact that the language used is Groovy and Groovy is java! So I knew I would not have too much trouble learning it and more importantly integrating with any java code or backend server would be a breeze. To bootstrap the process I read the two excellent books (which I strongly recommend): Groovy in Action and The Definitive Guide to Grails, Second Edition
.
After using Grails for several months, I am pretty happy with the choice. I have not used the ORM capabilities yet but I was able to successfully deploy my java backend services in the same VM (by wiring them directly in spring): the autowiring capabilities into the bootstrap sequence and the controllers made it a breeze. Working with controllers and gsp pages (equivalent to jsp for Grails) has been relatively painless and I certainly appreciate the ability to simply modify the source code and see the changes right away when refreshing the page in the browser (which satisfies the fast development lifecycle constraint). I really like the extensibility of the gsp layer by being able to write your own tag library (you can check my post about grails tag libraries): it is pretty straightforward and very powerful.
But in the end, the reason why I am really glad I made this choice is actually the Groovy language. I did not know it beforehand and I am very excited to add it to my toolbox. As I was mentioning before, Groovy is java. Groovy gets compiled into java and integrates seamlessly with it, which is huge for someone who knows java quite well. What it brings to java is big and when I go back to java, I always wish some of the features would be native to java: I love closures, duck typing and builders to name a few. Groovy allows you to be strongly typed if you want to, but unlike pure java, it is not enforced. There is always a battle between the two camps, but I prefer to be neutral and use whichever makes more sense in what I am currently working on.
// small example showing how concise and efficient the syntax can be
// duck typing in action:
// * this method will work on any object which has a method sort(Closure closure)
// * as long as the elements in the list knows how to handle '.name' it will work
// as well (which covers, properties, getters, and even maps (since map["name"]
// can be written map.name!))
static def sortByName(list)
{
list.sort() { e1, e2 ->
return e1.name.compareTo(e2.name)
}
}
Since my initial exposure to Groovy through Grails, I have actually started using it in other areas:
- For 'shell' scripts (which I used to write in bash, perl or python): you get all the power of java and groovy for writing small scripts so no need to switch to a totally different syntax.
- For testing: this is definitely an area where I really don't see any advantage in writing my unit tests in java anymore even if what I am testing is my java code. It is less code to write and it is more readable.
OSGi at LinkedIn (EclipseCon 2009)
My presentation about Building LinkedIn’s Next Generation Architecture with OSGi is live on the EclipseCon web site (slides + audio). Here is the abstract:
Over the course of the last 5 years, LinkedIn has been built using relatively simple technologies: front end web applications (tomcat/servlet/jsp), backend services (jetty/spring remoting), databases, replication, jms. Although the web site was scaling adequately, LinkedIn had some big challenges to overcome: In March of 2008, a group of Senior Engineers started a project to explore the best available technologies which could help in building the next generation of the architecture that would address those challenges. The new architecture involved using OSGI/Spring DM as the foundation because it had the right properties we were interested in. The code was migrated to a more modular paradigm using binary consumption. This session will demonstrate how we integrated OSGi, the pros and cons of the changes, the pain points as well as the migration strategy.
Improving performances of a Lucene Search
Lucene is a popular java based text search engine. You add documents to the index using an IndexWriter and then you can search the index using an IndexSearcher. In order to search, the most flexible api is to use the callback api:
indexSearcher.search(query, new new HitCollector() {
public void collect(int docID, float score) {
// do whatever you want...
}});
For every document which matches the query, Lucene calls the hit collector with the document id of the match as well as the score. This document id is internal to Lucene and cannot be relied upon as it can (and will) change (for example when optimizing the index). The usual practice is to add a field to the document that you index which contains an id which has meaning in your application:
Document doc = new Document();
doc.add(new Field("ID", String.valueOf(myID),
Field.Store.YES, Field.Index.NOT_ANALYZED));
This is useful as well when you want to update/delete the document from the index:
indexWriter.deleteDocuments(new Term("ID", String.valueOf(myID)));
In the callback loop, you can then retrieve your id from the Lucene document id by doing indexSearcher.doc(docID) which returns a Document from which you can simply extract your previously stored id.
This works fine, but is relatively expensive. Indeed, Lucene is very good at caching the index in memory but the problem is since the document is not part of the cache, then it requires a disk access. Depending on how many documents are matching the query it can have some serious implication on the performance.
When I mentioned that you cannot rely on the Lucene id because it changes, it is true. Nonetheless, while the index searcher is opened and until you close it, this id will not change. We can use this property to add some caching which will improve the performances quite a bit. The idea is that when you open the searcher, you simply 'read' and cache all your ids in memory (and you discard the cache when you close it):
String[] myCache = FieldCache.DEFAULT.getStrings(searcher, "ID"); // each entry in the cache is simply the doc id from Lucene!
To make things nicer, I hid all of this under some apis and created my own wrapper:
// a hit collector with userData
public interface LuceneHitCollector<T> {
void collect(int doc, float score, T userData);
}
// wraps a lucene searcher to use the new hit collector
public class LuceneIndexSearcherImpl<T> implements LuceneIndexSearcher<T>
{
private final IndexSearcher _indexSearcher;
private final T[] _userData;
public LuceneIndexSearcherImpl(IndexSearcher indexSearcher, T[] userData) {
_indexSearcher = indexSearcher;
_userData = userData;
}
public LuceneHitCollector<T> search(Query query, final LuceneHitCollector<T> collector)
throws IOException {
_indexSearcher.search(query, new HitCollector() {
public void collect(int doc, float score) {
collector.collect(doc, score, _userData[doc]);
}});
return collector;
}
}
The performance improvements are quite dramatic: a query that used to take around 350ms is now taking about 14ms! Pretty nice. Of course this will work well if you open your searcher and keep it open for several queries which is the case of my application. This technique requires some extra memory but if you can afford it, it is totally worth it.
Note that the api I created is using generics: I wanted to be able to use the payload feature if I need later on to store more than the id. For example if I wanted to store an id and a timestamp, I could create a small serializable object and store it (serialized) as a byte array in the payload of the field. When I open the searcher I could read all the payloads and deserialize them in an array of the correct object type (instead of an array of Strings like in the example). To create an array of the proper size you can simply use the searcher.maxDoc() api. The code to use the payload feature is a little cumbersome/complicated and would require too much code to demonstrate in this blog.
This post is presenting one solution to improve the performances of a Lucene search but there are many other techniques. It definitely works if you have a little extra memory to spare. I wanted to thank the LinkedIn search team for the inspiration!
git for source control management
When I started working on my project, I needed to setup an scm (source control management). One of the goal of the project was to look at new technologies and trends and see how well they can solve some issues I have encountered throughout my career. I am familiar with 2 popular scms: cvs and svn.
At LinkedIn, we started with cvs. Although being widely popular, it has several shortcomings, mainly non transactional commits and very heavy branching / tagging operations. Transactional commit is quite important as it insures that whenever you checkout code, you will always get a consistent view and not a partial commit that somebody else is currently checking in (or even worse if two people are checking in at the same time). "Consistent" is obviously from the point of view of the scm, as a developer may still check in an inconsistent set of files by forgetting to check in one of them for example. cvs fails in that regard as there is no consistency guaranteed. Branching and tagging in cvs is painful because it happens at the file level, so the bigger the project is, the longer it takes.
With a rapidly growing project and team, it was becoming unmanageable with cvs. We then moved to svn which was solving the two main issues I was talking about: with svn, commits are transactional: every time you commit you actually have an ever increasing commit number which refers to the entire commit. Either the entire commit will go through or not and another developer will never see an inconsistent view. Branching and tagging are very lightweight operations in svn as it does not copy the entire tree.
After using svn for a while, although being an improvement over cvs I am still pretty unhappy with it as there are several big issues like loosing the history of commits (mild problem) to loosing changes when files are moved around on a different branch (really bad). Also, although branching is much easier/faster than cvs, the syntax to merge is quite complicated and error prone by having to constantly figure out the correct commit numbers to use. To be fair, some of those issues are being addressed in more recent versions (although it is still not there yet).
In my quest for something better, I started hunting around and clearly the new trend is Distributed SCM. Open source projects are the perfect example of highly distributed development environments and having a centralized scm (like cvs or svn) is showing its limitations. Hence the need for an scm that would handle this kind of projects. I believe that the most prominent distributed scms today are git and mercurial. Both projects were started around the same time as a free/open source response to BitKeeper which was not going to be free anymore. git was created by Linus Torvald for handling the Linux kernel development. mercurial was created by Matt Mackall.
On the paper, despite some minor differences the concepts are essentially the same. Not having any preconceived ideas with either of them, I decided to use git for my own project mainly because of the IDE support: it has very good support in Intellij IDEA. I am by no mean an expert in git but I can share my experience after using it for several months now.
svn: creating branches, switching between them, merging,... is all a breeze. Nonetheless, there is something that takes time to get used to: the 'staging' area which is definitely not very intuitive at first especially coming from a different model. Let's take an example: you start modifying a file then you want to commit the changes. You must first move the file in the staging area by issuing 'git add <file>'. If before committing, you modify the file again, you must add it to the staging area again. In the IDE, you don't have to worry about any of this which makes it very transparent. On the command line, you just need to be a little bit more careful, but it has become a habit to always issue a 'git status' command which tells you the state of your changes and which ones need to be moved in the staging area.
svn for example, I would have to be able to somehow connect to my desktop to be able to continue working... But I don't have any of this issue with git at all: I simply cloned my repository on my laptop before leaving which you can do over ssh very easily as it is built-in: 'git clone user@machine:/path/to/repo'. I can then work on my laptop during the entire time I am away, creating branches, committing as many time as I want. When I come back I will simply issue a 'git push' command (yes that is it... I don't even have to tell it where to push to!!) to move back all the changes I have made onto my desktop with full history! Really neat! By the way since it is all local, it is also perfect for plane work!
git to facilitate the creation of open source projects. By lowering the barriers to be able to contribute to open source projects, I think it has great potential to become quite a nice platform.
I have not tried mercurial at all so I do not have anything on my own I can share. I have heard people complaining about the fact that mercurial does not allow you to rewrite the history where git does and depending on which camp you are, it may be good or bad. I don't mind that git allows you to do that and if you don't like it then you just don't use the feature.
As I was mentioning previously, distributed scm is the new trend. After trying it for myself for a while I have become very excited about it and I really don't believe it is just a fad. My personal opinion is that it is the future. Even for non distributed projects (like mine at the moment), the benefits are obvious. I do not regret the decision I have made as it allowed me to see the potential of this emerging technology! Distributed scm is a brilliant idea and I invite you to try it out: once you make the switch you will not want to go back.
Version Management and OSGi
On this previous post on the LinkedIn blog, I talked about bundle repositories. In this one I am going to cover version management and the particularities of OSGi.
How dows OSGi handle version ?
In OSGi you define your dependencies using headers in the Manifest. There are several ways to define dependencies (Import-Package, Require-Bundle, etc...). One way to constrain a dependency is to use version. A version in OSGi is defined as Major.Minor.Micro.qualifer.
Example of versions:
1 1.0 2.4.5.ABC_DEF
- Major, Minor and Micro must be numbers. The qualifier is a string (with some constraints) and is not interpreted as a number (check the javadoc for the details).OSGi does not attach any more meaning to the numbers and it is up to the user to manage the numbers the way they want.
- When defining a dependency, you can use a version or a version range.
version=1.0.0
does not mean that you depend on version 1.0.0, but it means that you depend on 1.0.0+ meaning anything greater than (or equal to) 1.0.0 will match!
If you really want to express that you depend on 1.0.0 and nothing else, it is expressed this way:
version=[1.0.0,1.0.0]
Example of ranges:
version=1 => means v >= 1.0.0 version=[1.1.0,2) => means 1.1.0 <= v < 2.0.0 version=(1,2] => means 1.0.0 < v <= 2.0.0
Versionning convention
As mentionned previously, OSGi does not attach any meaning to the various components of a version. Here is the convention that seems to have been adopted by some open source projects.
- The Major number represents a major version: it is assumed that there is no backward compatibility between 2 different versions of the same bundle where the major number is different. Usually it means that APIs have changed in a non compatible way. (It would for example be the case if java serialized objects have been changed in a way that their serial version ID is different).
- The Minor number represents a version which contains changes that are backward compatible. A backward compatible change is for example, the addition of a method to an interface or new classes and objects not present in a previous version.
- The Micro number represents a version which is also backward compatible but does not contain any api enhancements. It is usually used for bug fixes and minor improvements.
- The qualifier is a string and is being used for various purposes depending on the project.
Upgrading version
Let's take the following example:

There is a service which is exposed as a java interface. This java interface resides in the bundle api-3.0.0. A service does not really have a version but since it uses this api it is fair to say that the version of the service is 3.0.0. The bundle impl-3.0.2 provides the implementation of the service and exports it to the OSGi registry. There are 2 clients of the service (client1 and client2). They both depend on the api. Also there is another external bundle (called lib-2.0.0) which happens to be used by both clients both directly (in their code) and indirectly because the api exposes some objects from this library in the api.
Service API (3.0.0) ------------------- void f(FromLib200 param1);
Upgrade scenario
Lets now assume we enhance the service in a backward compatible way by offering a new api (new method on the java interface).
Service API (3.1.0) ------------------- void f(FromLib200 param1); void g(FromLib210 param1);
The new service API actually uses a new class which was not defined in the previous version of lib, thus requiring a new lib-2.1.0. We then assume that client1 uses this new enhanced api while client2 is left unchanged. Of course there needs to be a new implementation for this new api.
Upgrade results with minimal version lockdown
In this first case, we are assuming that we lockdown version only for protecting against incompatible changes. In other words we use ranges like this, locking down only on the major version number:
client1: api;version=[3.1.0,4),lib;version=[2.1.0,3) client2: api;version=[3.0.0,4),lib;version=[2.0.0,3) api-3.0.0: lib;version=[2.0.0,3) api-3.1.0: lib;version=[2.1.0,3)
Here is the result:

Although client1 has not been updated, it is going to start using the new service. Since the new service is backward compatible it is not really an issue per se. What is an issue though is that it is also going to start using lib-2.1.0. Why is it an issue exactly ? In a very dynamic production environment (like LinkedIn's), this scenario is very frequent. The danger comes from the fact that by simply deploying a new version of a service, it ends up affecting a client in a way that has most likely not being tested.
Upgrade results with maximal version lockdown
In this second case, we are assuming that we lockdown the version entirely. In other words we use ranges like this, locking down major, minor and micro:
client1: api;version=[3.1.0,3.1.1),lib;version=[2.1.0,2.1.1) client2: api;version=[3.0.0,3.0.1),lib;version=[2.0.0,2.0.1) api-3.0.0: lib;version=[2.0.0,2.0.1) api-3.1.0: lib;version=[2.1.0,2.1.1)
Here is the result:

Is there a solution then ?
As we mentionned previously, client2 should be able to talk to the new service because it is backward compatible. The only reason it cannot talk to it is due to class loading. If we were in separate containers we would not really have this problem (we would use spring rpc which does java serialization). So the idea is to replicate what happens when we are remote:

We can deploy a service which uses service 3.0.0 api and proxies all the call to the real service (we know that due to backward compatibility, the API of Service 3.0.0 is a subset of 3.1.0 so we should be able to proxy all calls). Due to class loading issues, the calls must go through java serialization (exactly like what would happen if it was remote...): in other words, we serialize all parameters with the class loader which loaded Service 3.0.0 and we deserialize with the one which loaded Service 3.1.0 (and vice versa for the return value/exceptions).
Conclusion
Solution 2 is not going to work. The choice is then between Solution 1 and 3. Solution 3 is not supported out of the box by OSGi and requires writing the proxy and the mechanisms to do the serialization / class loader transfer which is not necessarily an easy piece of code to write. Solution 1 is most likely the one that is going to be used in the end, and it is fine, as long as we are careful and aware of the 'dangers' of deploying more than one service in the same container. Distributed OSGi (RFC 119) is coming up and I think they will have to address some of the issues cross containers (the ability to upgrade a remote service to a newer backward compatible version without having to change the clients). So the point I was making is still valid: if it is going to work cross containers, it should also work in the same container (which is essentially Solution 3)...Grails - Invoking a tag lib from another tag lib
Grails comes with a predefined set of tags that you can use in your gsp pages. If you want to add your own tags, it is pretty simple and you can simply check the Dynamic Tag Libraries reference documentation. I created my own version of the <g:each> tag which allows you to provide a begin, end and separator attributes:
class MyTagLib {
static namespace = 'my'
// Equivalent to g:each but allow for begin/end and separator attributes
def each = { attrs, body ->
def var = attrs.var ?: "var"
def begin = attrs.begin ?: ""
def end = attrs.end ?: ""
def writer = out
if(attrs.in)
{
// not null and not empty (definition of truth in groovy)
attrs.in.eachWithIndex { elt, i ->
if(i == 0)
{
writer << begin
}
else
{
writer << attrs.separator
}
writer << body((var):elt)
}
writer << end
}
else
{
if(attrs.alwaysBeginEnd?.toString() == "true")
{
writer << begin << end
}
}
}
}
Here are some examples of rendering in gsp:
<my:each in="${[1,2,3]}" var="i">${i}</my:each>
produces: 123
<my:each in="${[1,2,3]}" var="i" begin="{" end="}" separator=",">${i}</my:each>
produces: {1,2,3}
<my:each in="${[1]}" var="i" begin="{" end="}" separator=",">${i}</my:each>
produces: {1}
<my:each in="${[]}" var="i" begin="{" end="}" separator=",">${i}</my:each>
produces:
<my:each in="${[]}" var="i" begin="{" end="}" separator="," alwaysBeginEnd="true">${i}</my:each>
produces: {}
This tag is pretty convenient as it automatically takes care of an empty list or one that has only one element to properly display the separator and the begin and end attributes. The last example shows how you can 'force' to display the begin and end attributes when the list is empty.
Now, let's say I want to create another tag which will reuse the code I already wrote. In other words, I need to call a tag from within a tag. Here is how I would do it:
def csv = { attrs, body ->
def var = attrs.var ?: "var"
out << my.each(in: attrs.in, var: 'v', separator: ',') { map ->
def elt = map.v
out << "{"
out << body((var):elt)
out << "}"
}
}
And here is the rendering in gsp:
<my:csv in="${[1,2,3]}" var="i">[${i}]</my:csv>
produces: {[1]},{[2]},{[3]}
It is actually not that trivial to call a tag from within a tag (and to my knowledge it is not documented)... let's cover each details:
- referencing another tag is used with the notation:
namespace.tagName(ex:my.each) - simply calling the other tag is not enough and the result must be sent to the writer (ex:
out << my.each(...)) - each attribute is passed in as a map, so you simply use the groovy map notation (ex:
(in: attrs.in, var: 'v', separator: ',')) - now the really tricky part is the closure which corresponds to the children tags in gsp... the argument that you get is a map (because in the
my.eachcode, thebodyclosure is called with a map!). Although it makes sense, it is not that trivial because in gsp you don't see it. This is why I need to usemap.vto have access to the element that is being iterated over (the variablevis because it is the one that I used in the call (my.each(..., var: 'v', ...)))
Although a little tricky to write, it is very powerful to be able to create tags that build upon other tags. There is one little caveat in how null is being handled and I opened a Jira ticket for it (GRAILS-4449) as it does not seem to be consistent.
Starting from scratch... domain name and web hosting
When I started having the idea of a project that I could build, two things popped up right away: I needed a domain name and I needed some sort of web hosting solution. To be honest, both topics were totally brand new to me and I spent a significant amount of time investigating what is available.
Getting a domain name sounds like a trivial thing to do, but it turned out to be not such an easy task for two reasons:
- Finding a 'good' domain name is important and of course most of them are already taken. A big portion of them are simply unavailable because people owns them in the hope of reselling them for profit. Once you have decided on a domain name, then figuring out if it is available or not is thankfully an easy task as every single name registrar allows you to check if it is available.
- There is a lot of websites that offer domain name registration, all trying to compete on which one is the most flashy. They all seem to have tons of different options or programs which adds to the confusion.
I ended up settling down for Network Solutions for the domain name of the company (pongasoft.com) because it had been recommended as being serious, they are offering a privacy feature (to not expose your private information in the whois database (albeit for an extra $10)) and I could see the potential of using their web hosting solution and 'Servlet' support. Clearly they are not the cheapest one, but I needed to start somewhere.
Once I had a domain name, I started looking for web hosting. I think it is probably even worse than domain name hunting as the variety of what is offered is way more diverse. On top of that, most of them adding to the confusion by bundling domain name for free (under some conditions). In the end, for me it was pretty difficult to choose also because I was not too familiar with what the terminology meant and what you could and most importantly could not do unless you actually tried it. It would definitely have helped to be really crisp on what I really wanted, but I wasn't sure at the time.
I decided to try with the web hosting solution provided by my name registrar (Network Solutions) because it offered blogging and 'Servlet' support. After giving it a shot for a while, I realized that 'Servlet' support is really a joke. First of all, finding documentation for it was a big challenge. But once I was able to experiment with it, it became clear that it was going nowhere: the only thing you could do is literally create a Servlet (= a class) and drop it in their web container... You just cannot deploy a war, so as a result you cannot have any dependency on anything that is not already in the container. Very very limited in what you can actually do.
The second one I tried was Oxxus.net. They offer your own private container which you can start and stop at will. They have different kinds of web hosting, some of them including blog, but the one that offers tomcat does not offer blog. It was not too much of an issue since you can install a blog application in tomcat (the blog you are currently reading is running on Apache Roller within tomcat :) ). What really ticked me off is how lax they were with security. First of all, the very first email they sent me after registration contained passwords in clear in the body (and we all know how secure email is...). Although not a very good practice, it is not too much of an issue if you can just change them yourself, which I did. To be able to use scp or sftp, they told me to login on their website with elevated privileges (as a non chroot jailed user) and to my surprise to do that I needed to use the same password that they had sent me earlier on, even after I had changed it on the website. In other words, they have a 'backdoor' which is totally accessible with passwords sent in clear that you cannot change... that was enough for me.
Finally the one I settled with and the one currently running this blog is RimuHosting (I found a lot of good comments on the web about them). I was very impressed right away by how much they care about security and how good their documentation is. Just check out their HowTo section and you will understand what I am talking about. What I have now is my own private VPS (Virtual Private Server). So I can do whatever I want on it. Install and run whatever kind of program. No restrictions. I must admit that I was a little bit afraid at first to have to manage a full blown OS since I am not a system administrator. Nonetheless, with the help of their very down to earth documentation and the Webmin interface, it turned out to be not a problem at all. Their support is very good too and they can help you set up whatever you want. I am currently very satisfied with this choice.
I investigated AWS (Amazon Web Services) but it is definitely a more expensive solution if you want to have a server up and running all the time (it costs 10c an hour... hence about $70 a month if you run it all the time). I think it is more suited if you can bring it up and down whenever you want to use it, but not if you want to run a blog or an email server which have to be up 100% of the time.
I recently bought another domain name for my product and this time I went with a company called Gandi. They were recommended by RimuHosting and they offer the same privacy protection offered by Network Solutions, except for free...
As a conclusion, to be fair to the companies that I have tried for web hosting, they all offered a 30 day money back guarantee. And it really works! I had no problem getting my money back. So I think it is ok to experiment if you don't know too much what will work for you and what will not. Usually it does not take that long to figure out whether you are happy or not.
Grails - Proper shutdown in dev mode
For my main project I am using Grails for the front-end (I will relate in an upcoming post why I chose this technology in the first place). Grails has this very interesting development mode which allows you to continue working on your application and see the changes right away. To start the application you usually issue the command:
grails run-appor if you use maven
mvn grails:run-appTo shutdown (for restarting for example), you do a
CTRL-C which terminates the process. Grails uses the Spring framework to bootstrap your application. It also allows you to define your own beans. However, I noticed that when terminating the application, the beans that I had registered with a destroy-method were not being properly shutdown (the destroy method is simply not called). I tried to find a way to change this behavior by default but did not find anything. I then implemented my own shutdown solution in this manner:
I created a simple class in grails-app/utils/com/mypackage/ShutdownHook.groovy which registers a VM-wide shutdown hook when it gets called by Spring (ApplicationContextAware)
package com.mypackage
import org.springframework.context.ApplicationContextAware
import org.springframework.context.ApplicationContext
import org.apache.commons.logging.Log
import org.apache.commons.logging.LogFactory
public class ShutdownHook implements ApplicationContextAware
{
public static final Log log = LogFactory.getLog(ShutdownHook.class)
public void setApplicationContext(ApplicationContext applicationContext)
{
Runtime.runtime.addShutdownHook {
log.info("Application context shutting down...")
applicationContext.close()
log.info("Application context shutdown.")
}
log.info("Shutdown hook setup...")
}
}
Then I added the following block in grails-app/conf/spring/resources.groovy which conditionally creates the bean only in development mode (thanks to groovy Spring DSL!).
if(grails.util.GrailsUtil.isDevelopmentEnv())
{
myShutdownHook(com.mypackage.ShutdownHook)
}
It works really well as my beans get properly destroyed when the application terminates. Nonetheless it would be better if it was part of the Grails framework by default. I opened a Jira (GRAILS-4404) ticket for it.
pongasoft.... a new adventure
One thing that I have been wanting to talk about is the 'story' behind pongasoft. It all began during the holidays 2008 when I realized that I had been working for LinkedIn for a rather long period of time (you could say that over 6 years is an eternity in startup lingo ;) ). LinkedIn is a cool company, don't get me wrong, and the technologies behind are very cool and also bleeding edge (OSGi for example). Nonetheless, you are still pretty constrained in what you can and cannot do for practical reasons. Also in 6 years, things have changed quite drastically in some areas: for example, web hosting is widely available and affordable, cloud computing is changing the face of how you can bootstrap an idea and be able to scale it with little investment upfront, new (dynamic) languages like Groovy and Scala are changing the landscape and offering new ways to implement ideas.
While brainstorming with my partner I had an idea for a tool that I wanted to build and decided to give it a shot. The purpose would be to essentially start from scratch with no preconceived ideas and try to use as much open source software and new technologies as possible. I have been very lucky that LinkedIn allowed me to work part-time since it allows me to concentrate on my own project while still having a portion of my income. I have been working on the tool for about 3 months now and it has been an incredible learning experience: from domain name purchase to cloud computing (barely started yet), the gamut of technologies involved is pretty large. I think the main drawback is that the progress is slower than I was anticipating and can sometimes be frustrating, but the journey in itself is awesome.
I am planning to share a lot of this experience in subsequent blog posts and of course release the [beta version] of the tool at some point in the near future for feedback. I am excited to see if people would be interested. If not then I won't be regretting a thing as what matters to me the most is the adventure.
Welcome to the software cookbook!
Hello kind reader. I guess it is tradition to introduce oneself as the first post on the blog. So here I go. My name is Yan Pujante. I am originally from France and currently living in the Silicon Valley close to San Francisco. I am a software engineer at heart and have been since I was 11 years old... a long time ago :). If you are interested to learn a lot more about me, you can always check my fairly complete profile on LinkedIn.
In this blog I am planning to write mainly about software, with an emphasis on sharing my experiences with the new technologies I am working with both at LinkedIn and pongasoft. I hope you find it interesting. Don't hesitate to leave feedback!
-
Search
-
Feed
-
Links
-
Recommendations
-
Recent Entries
- Connecting to a local vm using jmx knowing the process id.
- Configuring apache -> tomcat load balancer
- pongasoft presents... kiwidoc
- CSS for the UI design
- The real cost of high-speed internet in the US
- Grails/Groovy for the frontend
- OSGi at LinkedIn (EclipseCon 2009)
- Improving performances of a Lucene Search
- git for source control management
- Version Management and OSGi
- Grails - Invoking a tag lib from another tag lib
- Starting from scratch... domain name and web hosting
- Grails - Proper shutdown in dev mode
- pongasoft.... a new adventure
- Welcome to the software cookbook!
-
Calendar
