How to fix NAMESPACE_ERR with Nokogiri::XML::Builder in jRuby

note from Wed Jul 19, 2017

In a recent project that I was working on, there was a curious incompatibility with Nokogiri and jRuby.

I won't get into the details, but in short, Nokogiri::XML::Builder would raise the following java exception:

ActionView::Template::Error (NAMESPACE_ERR: An attempt is made to create or change an object in a way which is incorrect with regard to namespaces.):
  org.apache.xerces.dom.ElementNSImpl.setName(Unknown Source)
  org.apache.xerces.dom.ElementNSImpl.<init>(Unknown Source)
  org.apache.xerces.dom.CoreDocumentImpl.createElementNS(Unknown Source)
  nokogiri.XmlNode.init(XmlNode.java:340)
  nokogiri.XmlNode.rbNew(XmlNode.java:298)
  nokogiri.XmlNode$INVOKER$s$0$0$rbNew.call(XmlNode$INVOKER$s$0$0$rbNew.gen)
  org.jruby.internal.runtime.methods.DynamicMethod.call(DynamicMethod.java:218)

After a bit of debugging and hours of research, it turned out to be coming from creating nodes with namespaces. As it turns out, whereas Nokogiri uses libxml2 in Ruby MRI, it uses the Xerces java xml parser in jRuby instead. Xerces seems to be a bit more strict with how XML is constructed, and is particular about namespaces. The error essentially is raised because Xerces does not allow element names to be specified along with their namespace.

In the normal course of building an xml document, the code might look something like:

builder = Nokogiri::XML::Builder.new do |xml|
  xml.para 'this is some text'
end

If the element is namespaced, one might write:

builder = Nokogiri::XML::Builder.new do |xml|
  xml.send 'myDoc:para', namespace_def_as_hash, 'text in the namespaced element'
end

This generally works in Ruby MRI, but Xerces in jRuby will say that it is an illegal specification and will not permit it. It doesn't like

'myDoc:para'

Instead, one must use one of the following options to correctly specify namespaces when creating nodes where the element name is namespaced:

This generally works, and generally gets around any illegal method-name characters that might be in the element name:

xml['myDoc'].send 'para'

However, if your tag name happens to be 'p', it won't work. Somewhere in Rails, .send 'p' is intercepted and we get a nil. Maybe it's the puts alias, I'm not sure.

This works better, by calling method_missing on the builder directly, we can avoid any other mixins or monkey patches that might have priority:

xml['myDoc'].method_missing 'para'

If you are sure that there are no invalid characters in your tag name (which there really should never be), then this is the cleanest:

xml['myDoc'].para

And no more NAMESPACE_ERR in jRuby.

tags: #jruby #xerces #nokogiri #xml #builder #java

XML

Here are two offereings from O'Reilly on the topic of XML.

 

Rails books

This is the grandaddy of the Ruby on Rails books. Most Rails developers I know (including myself) started with this one.

I also like the Rails Recipes book from the same publisher, as it gives real world problems that I can relate to and then sample solutions. There doesn't seem to be a Rails 4 edition yet. Here is the most recent one that I've found.