Related Topics: Apache Web Server Journal

Apache Web Server: Article

Open-source vs. Microsoft tools for business-app implementation, Part 2

Installing Cocoon and its related files is a snap... but we still need your two cents

(LinuxWorld) — So far I've not received much feedback on the key business issues in the first Cocoon article, although quite a lot of comment has come in on two cost related issues: the impact Microsoft licensing has on hardware choices and the need to use BizTalk.

No one questioned the notion that you'd put all of the applications on the same box on the Unix side, but several people wanted to know why I put everything on one box instead of using an N-tier architecture for the Windows side.

There are two answers to that:

  1. First, I didn't want to load the comparison against Windows by using Windows SMP (a.k.a. the rackmount) to get inter-application isolation. I think doing that would drive up Windows-side costs quite dramatically and be considered unfair, particularly when we look at the complexities of recoverability and synchronization in a two-site environment.
  2. More importantly, there's a technical-management reason that starts with a negative. The applications don't need separate CPU and memory resources because they generally operate as a pipeline (meaning that first one is busy, then the next, and so on — rather than all of them competing for CPU and memory resources at the same time). Because there's no resource requirement for separate computers, I can get significant managerial simplification by putting everything on one machine.

Help! We need your feedback!
This little series about doing something fairly hard in both a Linux and a Windows environment depends on relevance and facts from reader contributions. If you've worked with or are thinking of working with either toolset, please contact the author. If you know someone with experience to share, particularly on the Windows side, please ask them to read this article.

The complexity I'm trying to avoid comes from the need to maintain a very high level of confidence in the integrity of the data stored. Because most security-related problems arise due to internal action, the ability to avoid the additional points of vulnerability that go with the rackmount — things like second network cards — seems highly desirable.

On the other hand, Microsoft's licensing policies might make an N-tier approach more financially attractive than I had expected. Several people told me that I would need the enterprise license for SQL-Server and cannot run a single-processor license on a dual-CPU machine.

Because the enterprise license for SQL Server is $19,999 per CPU, it would be quite a lot cheaper to put SQL Server on a single-CPU machine by itself than to buy a second license.

Similarly, BizTalk Enterprise Edition is $24,999 per CPU. If I need BizTalk, buying a separate uni-processor to run it will save about $16K up front and thus be worthwhile despite the need to buy a rack, a switch, four more net cards and three more $999 Windows 2000 Server licenses.

The notion that Microsoft licensing drives deconsolidation is a new idea for me and will take some thinking about.

Even more people questioned the use of BizTalk for this application.

Bruce Hutfless, who seemed to know what he was talking about, said I didn't need either BizTalk or ISA, which I described as a caching server:

That's the Microsoft line, it is really just a cheap firewall. Which is why I have never used it.

You can do XML/XSL translation without BizTalk. All you are getting is pre-canned SOAP and some XML/XSL templates and translations. An ActiveX DLL server-side object using the MSXML4 SDK gives you everything you need. What BizTalk gives you is some server-side objects and a pipeline. Just as easy to implement your own pipeline.

The problem is you have to hunt to find the MSXML4 SDK, which is the XML/XSL parser used in Windows 2000, Internet Explorer 6.0 and .Net. Microsoft hasn't done much to promote this little jewel. For obvious reasons!

In point of fact, ISA, BizTalk are going by the wayside in .Net server. Right now, a la Windows 2K, there are 13 different server product offerings of the OS. MS claims they will limit this number in the next release of .Net.

I'm still not sure about this... although if I needed the enterprise license, I'd sure want to prove him right. Microsoft included BizTalk in its promise to eventually bring forth a cocoon-like bundle (its Jupiter announcement), but no one with significant Microsoft experience offered a clearly better idea tied to the Nichievo application.

So far, no one's come forward with manpower-utilization information. One person with a .co.uk address did ask me — in polite language, to boot — what I'd been smoking to pick cocoon for this job, suggesting that ordinary people could deliver this application with more widely understood tools.

He's right: cocoon isn't necessary to do the Nichievo job. I picked it, however, to offer downstream benefits the other guys can't match. Using it isn't critical to doing the job, it is critical to winning the job.

Several people told me that they can't use stuff like cocoon on Linux because they can only find Windows programmers to hire.

Aside from the instinctive response that real programmers don't do Windows, this is economic nonsense. The market evolves to meet demand; if you can only get Windows people, it's because that's who you hire. Demand Linux expertise, and you'll quickly get people willing to try. This, in my experience, is the best you can say about nine out of ten Windows people: that they're willing to try.

This ties to one of those odd things you often see in systems-consulting. People have learned about the value of experience in using systems technologies, so you see formal RFPs — particularly government and large data-center RFPs — requiring five years of experience with just-released products... and big-name consultants solemnly signing off as having it! It's a nonsensical response to a nonsensical requirement, but it illustrates the market at work. Make Linux expertise a condition of employment and people will make the learning investment needed to meet your requirements.

I recently had a lunchtime conversation with a client that went something like this:

Client: Linux came up in the discussion on our new Web-documents server, but we're going with Windows.

Me: Why?

Client: IT won't support Linux. They say that if we use it, we're on our own.

Me: So what happened with your departmental e-mail last week?

Client: We eventually got Greg [a contractor] in. Those guys at Kingsway don't want to be bothered coming out. Talking to them is, is... I mean what the hell does "registry corruption" mean? And why do they always make it seem like it's our fault and it never happens to anyone else?

Me: But you're staying with Windows for the Web server because they'll support it?

Client: What do you think of this Iraq business?

The bottom line is simple: you get what you pay for. If you don't get the support you pay for, whether that's Linux or Windows, you've got a management problem that calls for defenestration, not resigned acquiescence.

Installing Cocoon

As the first step in doing the work, I installed the tools (on Solaris instead of Linux, because that's what's on my desk).

The documentation says to install the Apache Tomcat servlet engine first. So I downloaded that and discovered that I needed the Sun Java SDK 1.4 installed for Tomcat to work.

The SDK installed itself into /tmp via a self-extracting shell file, and I moved it to /opt, setenv JAVA_HOME. Then I untarred Tomcat in place and setenv CATALINA_HOME. One minute and 47 seconds later, my 8080 port was showing the Tomcat welcome page.

Getting cocoon installed was trickier — no Solaris pkg or binaries — so I downloaded the source file for 2.0.3, gunzipped it and then tried to untar it.

Tar bombed out with a directory checksum error, so I promptly fired off a bug report about the tar problem.

Then I downloaded the previous release, 2.0.2, and tried again... with the same result. But including the -i parameter (ignore directory checksum errors) seemed to work, so I set the environment variables needed and fired off build.sh

That failed, with lots of helpful messages such as:

../cocoon/code/cocoon-2.0.3/build/cocoon/src/org/apache/cocoon/serialization/POIFSSerializer.java:163: cannot resolve symbol
symbol : class POIFSElementProcessor
location: class org.apache.cocoon.serialization.POIFSSerializer
( ( POIFSElementProcessor ) processor ).setFilesystem(_filesystem);

Some fresh coffee helped me remember that standard tar has a problem with absurdly long paths and filenames. GNU Tar unpacked the file with no problems, so then I had to go file a "sorry, there's a bug, but its not in your code..." report.

One minute and nine seconds after starting build.sh a second time, the Cocoon page was up at 8080/cocoon. Start-to-finish, the whole thing had taken just under 90 minutes of elapsed time, most of it waiting for a total of about 107-megabytes of downloads to finish.

It doesn't get better
Joe Barr, who's been writing for LinuxWorld on comparing Linux and Windows install processes, should check this out. Download, unzip, use the tools provided to build or install, and the whole complicated structure fires up and runs. No licenses, no reboots, no media swaps, and for those who use GNU Tar by default (i.e., Linux users), no errors.

I don't have a Win2K box or the licenses necessary with which to try this project, but everything I've done with Windows suggests it's nowhere near as slick and effective in loading new applications toolsets as this is.

Getting the system to switch to https in order to encrypt the material being sent back and forth to and from the user was almost equally trivial. Once I found the documentation on running keytool to create my own certificate (needed to start encrypted sessions, not for authentication), it took about two minutes to copy the connector definitions into server.xml and restart Tomcat to get a functioning https server connection on port 8443.

Everything worked out of the box... well, except for the box. Java may be the slickest tool yet for turning an Ultrasparc II into a i80386. Loaded but idle, Tomcat/cocoon uses 19.1 megabytes of RAM and 0.19 percent of one CPU. But after I read some of the online cocoon documentation and played with the sample Web application provided, the resource usage had zoomed to 188MB of RAM and some page requests took 100 percent of a CPU and measurable time to fill.

Astonishingly, it's actually faster to read documentation directly from the Apache site than from my own test machine, provided I don't read anything twice. Caching helps a lot; the second time you process a document, response seems normal even if you use a different browser to avoid browser-caching effects.

Apache provides lots of documentation, but much of it assumes that the reader is comfortable with operational concepts and terminology for both Java and XML. However, the Cocoon Overview document does illustrate how complex subjects can be clearly and simply introduced and should be required reading for anyone looking at working with this product set.

The core idea is the separation of management, logic, content and style. This means that you can change any one of these without affecting the other three. Because it's aimed mainly at Web publishing, the classic example in one of the documents I read involves setting up a style to match a particular holiday, then switching the entire appearance of the site for one day by exchanging one definitions file for another.

That's neat, but it's not what Nichievo needs. On the other hand, the technology looks like a near-perfect fit here. The separation idea, for example, is carried forward to the basic Web-site structure. A sitemap file sets the general rules for the site: what's included, how it's processed and so on. But you can have sub-site maps that apply different rules to different hierarchies or projects.

This meshes perfectly with what we want to achieve at Nichievo:

  • storing the data once (actually twice, but the document copy exists for legal rather than functional reasons)
  • writing the logic once
  • getting the users to learn one set of addresses, passwords, and behaviors
  • maintaining several very different production environments or Web apps throughout all these processes

In playing with the samples provided, two things become very clear:

Need some help here!
This little series about doing something fairly hard in both a Linux and a Windows environment depends on relevance and facts from reader contributions. If you've worked with or are thinking of working with either toolset, please contact the author. If you know someone with experience to share, particularly on the Windows side, please ask them to read this article.

That's particularly important now, because the next article in this series looks at database and related code-development issues.

  1. Overall, this stuff is way beyond cool. It'll not only do I what I want for Nichievo, it really may make a lot of the advanced stuff of their dreams deliverable within this project's lifetime.
  2. It looks like almost everything I want to do at Nichievo qualifies as "easy." This means it's a variation on what comes with the system. Of course, going from perception to reality requires expertise... expertise I don't have. Realistically, I believe that it would take two to three days for an expert to get this project to the working prototype stage and about the same to debug to deliverable status. It will take me several weeks — most of it spent learning stuff that an expert would consider pretty basic.

From a programming perspective, Cocoon does several things I hadn't known about. Under development is an entire authentication framework that — even in its current state — will let me avoid having to invent that wheel. There's enough of a webDAV-inclusion capability already available. I'm considering using it to avoid worrying about managing file transfers while giving users freedom to add or modify things (...illusory freedom, since everything gets logged and nothing ever gets deleted).

Equally important, the data-handling for forms-validation, database-access and process-flow management is more advanced than I expected. Maybe you can't handle everything through these facilities, but I don't see any major gaps. That means I can reduce my expected time to finish those tasks while reducing vulnerabilities to code-changes down the line.

It also means that next time we'll be discussing database issues, not language issues. On the Linux side, at least, language may be a non-issue if I can find a way to make getting signoffs a point-and-click process that doesn't require back-end coding. On the Windows side, unfortunately, I have no idea — and boy, do I need feedback on this!

More Stories By Paul Murphy

Paul Murphy wrote and published 'The Unix Guide to Defenestration'. Murphy is a 20-year veteran of the IT consulting industry.

Comments (0)

Share your thoughts on this story.

Add your comment
You must be signed in to add a comment. Sign-in | Register

In accordance with our Comment Policy, we encourage comments that are on topic, relevant and to-the-point. We will remove comments that include profanity, personal attacks, racial slurs, threats of violence, or other inappropriate material that violates our Terms and Conditions, and will block users who make repeated violations. We ask all readers to expect diversity of opinion and to treat one another with dignity and respect.