MSGs Conversion Issues

This technical note will describe how MSG files are converted in Lexbe eDiscovery Platform and as part of  Services processing. This includes how to handle possible data corruption issues in these file types.

NOTE. Our system first will extract the files and attempt to convert them to PDFs as part of our automated file processing. But, we do not automatically repair corrupted MSGs because it also involves manual intervention that might stall processing queue.

What are MSG Files

MSG is the file extension for the Microsoft Outlook and Exchange mail documents. MSG files are often included inside PST or OST files, which contain multiple MSGs and metadata, principally including sender, receivers, date and time sent, date and time sent and subject. Date and time are recorded in Universal Time (UT, formerly GMT) and a local time offset in the viewing computer is used to display the email message in local time. An MSG file may be encoded in either binary or ASCII. The message body may be encoded in formats including plain text, RTF, Word and HTML (with graphic links). MSGs may alternatively represent calendar items, contacts, and reminders.

For each MSG, selected metadata fields are extracted and fields to associate with the MSG. A UT (GMT) offset that the user inputs at time of upload is used to set the time of the email to a designated local time, rather than UT(GMT). The email body and attachments are extracted and then associated with the MSG to allow integrated viewing. The email body is converted to PDF, as are the email attachments. 

Email bodies are named with Date - Time - Subject, as follows:

YYYY-MM-DD HH:MM:SSPM - Email Subject Line

If any files cannot be converted to PDF, then a placeholder is generated if the conversion failure is detected. A native version of the body and attachments are accessible in the Original tab in the document viewer, and in the Original tab of a Lexbe Briefcase Download or Production. For the email body, a PDF version is generated as the 'Original' native version as many emails now are encoded in HTML with numerous graphic file links and a pure native HTML version might not display well as links to web-served graphic files may be broken and lead to a poor display if not converted to PDF.

How are the Email Bodies and Attachment Relationship Represented in The eDiscovery Platform

After processing, several versions of an MSG email body and attachments are available for view in the Document Viewer>Doc tab.

For the email body, a PDF version is available in the PDF tab, and text from the PDF is available in the Text tab. A download link to the MSG is available in the Original tab and extracted text from the MSG is available in the HTML tab.

When viewing an Email body in the Document Viewer, each attachment is available in the Related Document window and the same or a new Document Viewer (for a new Document Viewer right click and select ‘New Window’). When viewing an attachment, the other related attachments are available as well in the same fashion.

From the Browse or Search pages, MSGs may be viewed in isolation by filtering on File Extension = MSG. Also from the Browse page, you can see if documents are processed email attachments by showing the ‘Is Attachment’ column.

If you wish to see in Browse or Search, MSG email bodies or attachments in order, then either Bates numbers or Control numbers must be applied. This is because email MSGs are named with metadata information (see above) and attachments are named with the attachment file name, so a normal title sort puts the documents out of order. Control numbers can be applied to a case anytime. Bates numbers are applied to a case at the time of a production. In either case the Control or Bates numbers place the attachments in order after the MSG message body. So a Control number or Bates number sort form the Browse window allows MSG email bodies and their attachments to be reviewed in order.

Why Might MSGs Fail to Convert Properly

Like all native files (e.g., Word, Excel, PowerPoint, etc.) in processing, a certain number of MSGs may fail to convert properly. These reasons include MSG corruption and malformation. MSGs can be particularly prone to corruption as many programs can convert files to MSGs and may do so incorrectly sometimes. Also there are many versions of MSGs in existence as Outlook has developed and some versions may not be supported by all programs reading or converting them. Also the email body may be encoded in a number of different formats (e.g., HTML, Text in various formats, Microsoft Word in various formats, RTF), and this can lead to encoding and conversion problems, and corruption.  Finally, emails can corrupt in transit over the Internet or as part of file copying.

Examples of email failing to convert properly include:
>Email attachments failing to extract
>Email metadata failing to extract and field correctly
>Email bodies failing to extract

Why Might I See an Email header that Looks OK but not the Body

The rendition of an email body you see in Outlook or another email viewing program, with ”To", "From", "Date", etc. at the top of the page, and the email body underneath looks like a standard Word document, but in actuality the header fields are pulled from metadata and the view is constructed in the email viewer. If the body is corrupt and cannot be extracted, then the header may show but not the body. Often the text can be extracted from a corrupt or partially email and can be viewed in the HTML tab in the Lexbe eDiscovery Platform viewer and is searchable.

It’s possible that an email might display in a version of Outlook and not be extractable in Lexbe eDiscovery Platform, and vice-versa. Or an email might be viewable in a version of Outlook and not another.

What Can I Do with Outlook Emails that Fail to Convert

If an email fails to convert during uploading, it may be possible to download the original MSG that did not convert, open and convert to PDF locally with Outlook with other software or utility. In this case you can download the MSG from the original tab of the Document viewer or as a Briefcase download from Browse.

Additionally, for email that has field to extract and convert an email body, often our search index has been able to extract a text version. This will be unformatted and may be incomplete, but often has all needed data. This version is available in the HTML view and can be downloaded as text or saved to PDF with a local PDF print driver.

Convert Native MSGs to PDF Locally and Re-Upload

Alternatively you can download email MSGs from the Original tab, convert the email body or attachments as needed to PDF locally using a third-party software, and then re-upload the PDF version of the email to Lexbe eDiscovery Platform. Here are the steps:

1-From the Browse or Search pages, select the the email that you wish to download, and then download the native version of the file from the Original tab.  This will download the MSG, which includes the email body and attachments.

2-Convert the body or attachments as needed to PDF, using PDF saving or printing software.  This can be done by Acrobat Pro and numerous PDF print driver utilities.  The email body and attachments should be saved/printed to PDF separately.

3-Name the email body and any attachments consistently so that they group in a title sweet.  E.g.:
>2010-01-01 2:00:00AM Email Re Upcoming Meeting.pdf   (this is the email body)
2010-01-01 2:00:00AM Email Re Upcoming Meeting, Meeting Agenda.pdf (this is attachment 1)
2010-01-01 2:00:00AM Email Re Upcoming Meeting, Attendance List.pdf (this is attachment 2)

4-Re-upload the converted PDFs to the the case, and then transfer any metadata needed from the original email body to the replacement (e.g. Date Sent, From, To, Cc, Bcc, Subject, etc.)

5-We recommend doing these one at a time to keep emails and attachments consistent.

We also offer to our customers manual conversion as part of our eDiscovery Consulting and Technical Services.