Wednesday, February 20, 2008

On binary file format complexity of DOC and DWG

Microsoft published the binary file formats for Office a week ago.

Here is an explanation on: Why are the Microsoft Office file formats so complicated?

  • There are a lot of optimizations in the file formats that are intended to make opening and saving files much faster.
  • They were not designed with interoperability in mind.
  • They have to reflect all the complexity of the applications.
  • They have to reflect the history of the applications.

To some extent this applies to binary formats like DWG. One day maybe Autodesk has to or decides to open up their format just like Microsoft. The closest to a DWG specification is the one that ODA has done based on reverse engineering.
OpenDWG specification in RTF format.

The general arrangement of data in an R13/R14/R15 file is as follows:
HEADER
  FILE HEADER
  DWG HEADER VARIABLES
  CRC
CLASS DEFINITIONS
PADDING (R13C3 AND LATER)
IMAGE DATA (PRE-R13C3)
OBJECT DATA
  All entities, table entries, dictionary entries, etc. go in this section.
OBJECT MAP
UNKNOWN SECTION (R13C3 AND LATER)
SECOND HEADER
IMAGE DATA (R13C3 AND LATER)

DwgInfoTip that I made is working with the raw binary format to extract DWG properties and show them as an infotip (tooltip) in Windows Explorer. AutoCAD is not even required to be installed for it to work. If it is installed you will see the infotip in file selection dialog boxes like the Open and Save dialog box.


Some of the latest blog posts

Subscribe to RSS headline updates from:
Powered by FeedBurner