Standards for Technology in Automotive Retail
Compression may not be required for all messages because the cost of compressing the payload or implementing a compression function may not justify the value. If compression is to be used on a document, there must be a way to communicate that the payload is compressed before the receiver attempts to process the payload.
Based on a variety of research, it was determined that the most appropriate algorithm for a payload compression mechanism was that found in gzip. This section will review the key requirements that went into that selection and describe how gzip meets the criteria and why it was selected as the preferred standard compression algorithm. Other compression algorithms exist in the marketplace that may have benefits over gzip but the wide adoption of gzip makes it suitable for a minimum requirement.
Other compression algorithms are not precluded from STAR usage. Any other algorithm used must have two considerations addressed:
The type of algorithm MUST be transmitted as an element in the uncompressed SOAP envelope instead of “gzip”.
Between the two specific partners, the partner agreement (CPA, WSDL, or out-of-band) specifies that both parties support that algorithm before sending the message.
gzip a loss-less compressed data format and the deflation algorithm used by gzip (also zip and zlib) is an open-source, patent-free variation of LZ77. It finds duplicated strings in the input data. The second occurrence of a string is replaced by a pointer to the previous string, in the form of a pair (distance, length), distances are limited to 32K bytes, and lengths are limited to 258 bytes. When a string does not occur anywhere in the previous 32K bytes, it is emitted as a sequence of literal bytes. (In this description, "string" must be taken as an arbitrary sequence of bytes, and is not restricted to printable characters).
Since the amount of compression obtained depends on the size of the input and the distribution of common sub strings, the large amount of spaces that exist in XML documents is well suited to gzip. Typically, text such as source code or English is reduced by 60-70% while large XML document can exceed 90% compression ratios. It is covered by the GNU General Public License. gzip is supported and available on all major platforms and is widely used and implemented.
With payload compression, the compression mechanism is only needed for the compression of payloads that can benefit, such as the STAR BODs. Normally, executable, multimedia, and binary data formats are efficient enough such that little gain is realized from compression. The header of the SOAP message will maintain a relatively consistent size and will not be large enough to require compression. The effort to compress and decompress the SOAP header will affect performance especially when traversing through intermediaries that need to examine the header. Where this is a concern, the focus should be on compression of the XML to significantly reduce transmission time and increase performance, but retain uncompressed headers to avoid intermediate decompression/recompression.
There are two issues while using payload compression:
If an algorithm other than gzip is used, then a mechanism for advertising the algorithm with the message MUST be included and both parties MUST support that algorithm.
When programmatically assembling and processing messages, a mechanism to programmatically handle the compressed attachments at the endpoint MAY be necessary.
The application needs to make determination on payload compression since there is no distinguishing between pre-compressed content and test content.
HTTP compression is important for both STAR Web Services and ebMS transport methods. HTTP compression is the technology used to compress contents from a Web server (also known as an HTTP server). The Web server content may be in the form of any of the many available MIME types: HTML, plain text, images formats, PDF files, XML etc.
HTTP Compression Exchange
The publicly defined exchange between the requester and the web server serving the HTTP resources can be summarized as follows:
A web client (e.g. web browser) that is capable of receiving compressed content indicates this in all of its requests for the resources by supplying the "Accept-Encoding:” header request field in the request. The Accept-Encoding header is followed by a comma-separated list of encoding names.
When the Web server sees that request field then it understands that the client is able to receive compressed data in the standard gzip compress and other formats specified in the Accept-Encoding header
If a compressed static version of the requested document is found on the Web server's file system and matches one of the formats the client says it can handle then the server can simply choose to send the pre-compressed version of the document instead of the much larger uncompressed original.
If no static document is found on the file system which matches any of the compressed formats the client can accept then the server can now choose to just send the original uncompressed version of the document or make an attempt to compress it the resource in "real-time" and send it back to the client
HTTP Compression Standards
Content-Encoding, Transfer-Encoding and HTTP compression is a recommendation of the HTTP 1.1 protocol specification for improved page download time. Benefits of Using HTTP Protocol compression:
Three independent studies highlight the benefits of HTTP compression (Two conducted by the WWW Consortium (W3C) and one conducted for the Mozilla organization).The first W3C study, reported in 1997, focused on testing the effects of HTTP 1.1 persistent connections, pipelining, and link-level document compression. The second W3C study, reported in 2000, looked at the possible benefits for performance using compression of HTML files over a LAN with composite HTML data (compressed) and image content (uncompressed). The Mozilla study “Speed Web delivery with HTTP compression” (Radhakrishnan), reported in 1998, observes the performance of content-encoded compression.
Additionally no programmatic manipulation is required to introduce HTTP level compression since this is managed at the transport layer by the infrastructure
Web Server support
Most popular Web servers are still unable to handle the final step (see HTTP Compression summary above) in the HTTP exchange where they are required to perform real time compression on the resource before sending it to the client. For example:
The Apache Web Server which has a large share of the Web server market is still incapable of providing any real-time compression of requested documents. However, there is an open source module (mod-gzip) available for Apache that enables such compression.
Microsoft's Internet Information Server: If it finds a pre-compressed version of a requested document it might send it but has no real-time compression capability, IIS 5.0 uses an ISAPI filter to support gzip compression and when the client requests a resource, the server serves it and then stores a copy of it "compressed" in a temporary folder. Subsequent requests are served the compressed copy.
IBM's WebSphere Server which is based on Apache and the SunONE Web Server has some limited support for real-time compression though the use of the open source patch.
There are third party products available that can be plugged in too web servers to enable compression, e.g. JXEL. Such plug-in type products enable HTTP compression for multiple web server types. If web servers are used to implement the STAR transport mechanism, then they must be evaluated to provide the final step of HTTP compression.
As mentioned, compression is most beneficial for large, textual documents. When compressing a total SOAP message with multiple payloads in the body, there is no discrimination between small textual, large textual, binary, or pre-compressed payloads (such as JPEG images). The tradeoffs between processing time and benefit of compression become harder to predict. The decision to compress or not and when to break multipart messages into individual messages becomes more complex.
The STAR Web Services Guidelines assumes that messages subject to HTTP compression will normally be XML-only documents.
Compression is NOT REQUIRED for all document transfers, however if compression is agreed upon between training partners then the following requirements MUST be met.
It is REQUIRED at a minimum to use gzip as the algorithm for compression and others can be used as agreed upon by trading partners. At this time, algorithms other than gzip MUST be negotiated out of band between trading partners. In future, this negotiation of algorithm capabilities SHOULD be dynamic between web servers in the headers or described in CPA and WS-Policy element
It is RECOMMENDED that:
Dynamic HTTP compression be used on Web Servers listening on HTTP endpoints that do not use SSL or transport level security. At this point, it is also REQUIRED that an agreement exist between both trading partners before implementing dynamic HTTP compression.
It is RECOMMENDED that static compression not be used on Web Servers listening on HTTP endpoints due to the dynamic nature of XML data.
Care MUST be taken with SSL and compression that SSL occurs below the compression, such that payloads are encrypted first then compressed second.
Architectures can be configured to support both SSL and HTTP compression in standard ways using network devices. While this document does not dictate any physical hardware or network infrastructure, the following is explicitly noted.
When hardware (card) base SSL processing is used, it is REQUIRED that the Web Server listening on HTTP endpoint inherently support dynamic compression in addition to and along with SSL either out of the box or through the use of third party plug-ins.
When network device based SSL processing is used it MAY be possible to use the HTTP compression on the web server, in the same way as usual HTTP. Since the Web Server listening on HTTP endpoint is oblivious to the client and encryption it is REQUIRED to support dynamic compression.
The SOAP envelope of an ebMS message will never be compressed so that routing information can be available without the need for decompression.
In ebMS, the BOD payload will be compressed when the payload exceeds 1MegaByte. The MIME content-type will indicate if the payload attachment needs to be decompressed.
When building an outbound ebMS message, the SOAP envelope and the STAR BOD will each exist in their own MIME part according to the (SWA) SOAP with Attachments standard. The first MIME part will contain the SOAP header and body for routing of the attachment/s. This MIME part will never be compressed. The second and any additional MIME parts will consist of STAR BODs. These MIME parts will indicate if the BOD is compress based on the value of the content-type. A compressed BOD will indicate that the content-type is application/gzip, (which is the globally expected standard MIME description of a file compressed with gzip). A small BOD less then 1 MB in size will not be compressed and its MIME type will be application/xml (globally expected MIME description for an XML document). As the receiving endpoint processes these MIME parts, the first MIME part will always contain the ebMS SOAP envelope for routing information while the second MIME part (and any additional MIME parts) will contain BODs. The MIME content-type will let the receiver know if decompression is required before parsing the attached BOD. Any part that is described with a content-type of application/gzip will be decompressed before it is parsed, if the content-type = application/xml decompression will not be required and the MIME part will be parsed as regular XML.