September 20, 2015

Groovy by example: XML / HTML transformation (part 2 of 2)


Pages: 1 2

Transforming XML (continued)

Example: Flip an HTML table by 90°

I have built the example transformation function in two parts:
protected static String flipTable(def html) {
    // 1. Implement row / column flip
    // 2. Build new table
}
For more simple transformation tasks, combining these two steps into the actual builder may be preferred.

Transforming the structure

Here is the first part of the method:
Writer writer = new StringWriter()
MarkupBuilder xmlBuilder = new MarkupBuilder(writer)

Map rows = [:].withDefault{[]}

// 1. Implement row / column flip
html.body.table.tbody.tr.eachWithIndex { _tr, _tr_i ->
    // for each row
    _tr.'*'.eachWithIndex { cell, cellNo ->
        // for each cell in row
        if (rows[cellNo][_tr_i] == null) {
            rows[cellNo][_tr_i] = cell
        }
        else {
            // if its place is already occupied that means that it was filled by a previous cell
            // with colspan/rowspan > 1. Search for the new free place in the same row
            rows[searchNextFreeSpace(rows, cellNo, _tr_i)][_tr_i] = cell
        }
        int rowspan = cell.@rowspan.toInteger()
        if (rowspan > 1) {
            (rowspan-1).times { i ->
                // mark all cells consumed by the ROWspan as "occupied" by inserting a value there
                rows[cellNo] << false
            }
        }
        int colspan = cell.@colspan.toInteger()
        if (colspan > 1) {
            (colspan-1).times { i ->
                // mark all cells consumed by the COLspan as "occupied" by inserting a value there
                rows[cellNo+i+1] << false
                if (rowspan > 1) {
                    (rowspan-1).times {
                        // mark all cells consumed by the ROWspan as "occupied"
                        // by inserting a value there
                        rows[cellNo+i+1] << false
                    }
                }
            }
        }
        
    }
}
The first part is the actual “row to column flip” implementation which actually isn’t that interesting from our XML-centric point of view. The important thing is that it works with the originally parsed node tree and returns a data structure which still contains the original nodes, although newly arranged.

The code builds a map of all output rows, with a row number mapped to the list of cells it will contain. It walks through the input table, row by row, and cell by cell, and inserts these into the new data structure at their appropriate position. The tricky part which takes up most of the code lines is handling cells with colspan > 1 and rowspan > 1. These will mark occupied neighbor cells with a boolean flag, and in subsequent runs, whenever a target cell is already occupied, the next free cell is checked and used if empty, jumping to the next row / column (this is implemented in the searchNextFreeSpace(…) method).

Rebuilding the structure

Here is the second part of the method:
// 2. Build new table
xmlBuilder.table(html.body.table[0].attributes()) {
    tbody {
        rows.each { rowIndex, tuples ->
            tr {
                tuples.each { tuple ->
                    if (tuple == null) {
                        // this cell was originally not present. Insert an empty one
                        td()
                    }
                    else if (tuple in Boolean) {
                        // this cell is marked as "jump over" due to previous colspan/rowspan > 1
                    }
                    else {
                        // insert the whole XML tree
                        copy(flipCell(tuple, rowIndex), xmlBuilder)
                    }
                }
            }
        }
    }
}

return writer.toString()
  • Using the HTML builder, build a table just as in the previous example (no need for any parent structures), preserving the original table’s attributes.
  • Inside the table, place a tbody node (statically).
  • The next code line is a Groovy loop, not a builder invocation: iterate over the previously created rows, and then, for each row: build a tr structure.
  • Then iterate over the tuples of the row and build the td / th element accordingly.
  • Because of the way how HTML tables work, a row may actually contain less cells than it’s supposed to. Browser rendering will just jump to the next row then. In the example table, this is the case in the “Indonesia” row. But because we switched rows and columns, this would not work anymore as omitting a cell would cause a column shift. So, detect null cells and just insert an empty cell.
  • If the cell is a Boolean, it’s simply a leftover from the cells previously marked as shifted by colspan / rowspan. Simply jump over them.
  • Otherwise, apply additional local transformations to the cell and insert its entire sub-tree into the structure. The cell transformation is done in the flipCell(…) method we will inspect presently; recursively copying the node plus all its sub-nodes into the tree of course is effectuated using our previously prepared copy(…) method.
flipCell(…) is a very use case specific modification of the node: We have to turn former colspan attributes into rowspan attributes, and vice versa, and do other similar clean up work:
private static Node flipCell(originalNode, int index) {
    def attributes = originalNode.attributes().collectEntries { attr ->
        if (attr.key == 'colspan') {
            return [(attr.key): originalNode.attributes().rowspan]
        }
        else if (attr.key == 'rowspan') {
            return [(attr.key): originalNode.attributes().colspan]
        }
        else {
            return attr
        }
    }
    // change td nodes to th nodes if they are in the first row
    QName name = new QName(originalNode.name().namespaceURI, index == 0 ? "th" : 
        originalNode.name().localPart)
    return new Node(originalNode.parent(), name, attributes, originalNode.value())
}
Again, the internals of this method are not really interesting from an XML transformation point of view. The method returns a new Node which copies the information from an original node provided (including all of its children), except that it swaps the colspan and rowspan attribute, plus it changes cells into header cells if they're now in the 1st row. Note that this latter adjustment also means that the entire operation is not strictly reversible (because there’s no th to td transformation).

Even for the main use case of flipping a table, there are many ways to implement the actual logic, and the code listings shown here do most certainly not represent an optimal solution as it’s really just an illustrating example.

Printing XML

As mentioned previously, with a new MarkupBuilder(StringWriter), you get pretty printing of the resulting xml markup for free:
writer.toString()

Printing HTML

However, when using HTML as webpage markup, there’s a well-known problem which especially applies to pretty-printed markup: Because inner text may collide with pretty-formatting whitespaces, those additional whitespaces may be interpreted by the browser as additional to-be-printed whitespaces. This problem is lengthy discussed in this stackoverflow thread.

I have also incorporated one of the solutions offered by this stackoverflow thread, namely, inserting comments at the critical areas of the markup. I do so by using plain String regex in the formatHtml(String) method:
protected static String formatHtml(String html) {
    return html.replaceAll($/([^>])\n(\s*)</$, { all, lineEnd, space ->
        // text, followed by EOL, followed by a tag
        "${lineEnd}<!--\n${space}--><"
        // insert comment between EOL and tag
    })
}

Conclusion

XML processing is actually one of my favorite applications for the Groovy programming language. It really shows the versatility of Groovy being both a full-fletched general-purpose programming language and a quick and handy tool for everyday tasks. It serves me, as a Java programmer, as a powerful tool to tackle the otherwise cumbersome task of XML processing in a very concise way in well-known Java terrain.

A major hurdle for this example implementation in my view was the amount of information available about Groovy’s XML parsing / building facilities. Although there is quite a lot of basic information on the official website alone, these examples don’t really convey the amount of information necessarily to tackle non-trivial real world XML transformation tasks. I really hope that this article will help you if you’re stuck with Groovy XML / HTML processing and helps you finding new ideas of how to handle a problem in the process.

Again, the complete source code of this example implementation is available on GitHub.

Please let me know in the comments section whether this article was helpful for you or if it lacks any important information.


Pages: 1 2

No comments:

Post a Comment