Bavi_H's Blog

Notepad bug: Saving with word wrap on inserts CR CR LF characters in the display window


by Bavi_H, 7/27/2008

When you press the Enter key on Windows computers, two characters are actually stored: a carriage return (CR) and a line feed (LF). The operating system always interprets the character sequence CR LF the same way as the Enter key: it moves to the next line. However when there are extra CR or LF characters on their own, this can sometimes cause problems.

There is a bug in the Windows XP version of Notepad that can cause extra CR characters to be stored in the display window. The bug happens in the following situation:

If you have the word wrap option turned on and the display window contains long lines that wrap around, then saving the file causes Notepad to insert the characters CR CR LF at each wrap point in the display window, but not in the saved file.

The CR CR LF characters can cause oddities if you copy and paste them into other programs. They also prevent Notepad from properly re-wrapping the lines if you resize the Notepad window.

You can remove the CR CR LF characters by turning off the word wrap feature, then turning it back on if desired. However, the cursor is repositioned at the beginning of the display window when you do this.

Update 11/14/2010: If you are familiar with a hex editor, you can modify one bit in notepad.exe to prevent Notepad from inserting CR CR LF characters at wrap points.

Notes

Any place where the Enter key was pressed still has the normal CR LF characters.

The CR CR LF characters are only in Notepad's display window. The CR CR LF characters were not saved in the text file. The long wrapped lines were saved unbroken.

After saving with word wrap on, the cursor may move, long lines may be re-wrapped, and Notepad may not repaint its window properly.

Technical details

When turning on word wrap, Notepad wraps all long lines in the display window without inserting any new CR CR LF sequences.

When turning off word wrap, any sequence of CR CR LF characters in the display window are removed.

If word wrap is turned off when saving a file, all characters in the display window are saved.

If word wrap is turned on when saving a file, any sequence of CR CR LF characters in the display window are removed before the file is saved. After the file is saved, all the long lines in the display window are re-wrapped and new CR CR LF sequences are inserted at the wrap points.

After saving with word wrap on, the cursor may move. The number of characters from the beginning of the display window to the cursor position remains the same before and after the save; each CR and LF character is counted as a separate character. If the removal and insertion of CR CR LF sequences aren't exactly the same, then the cursor will move.

If you are saving a new Untitled window for the first time, or if you use the Save As command to save, then Notepad repaints its display window.

If you are re-saving an existing file, then Notepad doesn't repaint its window. This is particularly confusing if the lines were re-wrapped. One way you can force the display to repaint is by switching to another window then switching back to Notepad.

When moving the cursor using the arrow keys or by clicking the mouse, Notepad treats CR LF and CR CR LF sequences as indivisible units. But when a file is saved when word wrap is on, Notepad may end up repositioning the cursor in the middle of one of these sequences. If you start typing, you may end up creating stray CR or LF characters which appear as a box.

Example

In the following example, normally invisible characters are shown in pink. A centered dot represents a space, a left arrow represents a CR, and a down arrow represents an LF.

  1. Open Notepad, turn on the word wrap option and enter some text. Note the number of characters from the beginning of the display window to the cursor is 139.
  2. Save the file. Notepad has inserted CR CR LF characters where the long line wraps. Notice the number of characters from the beginning of the display window to the cursor is still 139, but because of the extra CR CR LF characters, the cursor has moved left compared to step 1.
  3. Resize the Notepad window to a larger width. Notice Notepad doesn't properly re-wrap the long line.
  4. Save the file again. Notepad removed the CR CR LF characters, saved the file, re-wrapped the long line, and inserted new CR CR LF characters, but the display window hasn't yet been repainted. The cursor has moved again, but the true position of the cursor isn't obvious yet because the window hasn't been repainted.
  5. Switch to another window, then switch back to Notepad. Notepad repaints the display window. Notice the number of characters from the beginning of the display window to the cursor is still 139, but now the cursor has moved right compared to step 2.
    1. Select all of the text in Notepad and copy it. Create a new e-mail in Outlook Express and paste the text into it. The CR CR LF characters cause Outlook Express to double space the text.
    2. Open WordPad and paste the text into it. The CR CR LF characters cause WordPad to insert an extra space.
  6. In Notepad, turn off the word wrap option, then turn it back on. Notepad removed the CR CR LF characters and positioned the cursor at the beginning of the display window.
    1. Copy the all the text from Notepad and paste it into Outlook Express. The text is now properly pasted as word-wrapped lines.
    2. Paste the text into WordPad. The text is now properly pasted without extra spaces.
  7. In Notepad, place the cursor two characters after the first wrap point in a paragraph. (The number of characters from the beginning of the display window to the cursor is now 57.)
  8. Save the file. Notepad inserted CR CR LF characters at the wrap points. The cursor moved and is now in between CR CR characters and an LF character. (The number of characters from the beginning of the display window to the cursor is still 57.)
  9. Start typing. The CR CR characters and the LF character become separated and now appear as boxes.

Scripts

Here are some relevant AutoHotkey scripts.

Show clipboard

The following AutoHotkey script will show the contents of the clipboard in a system tray balloon whenever the clipboard is changed. Control characters are decoded; for example, CR appears as ^M and LF appears as ^J.

[Show clipboard screenshot]

This script is useful to experiment with the way Notepad behaves by copying text from Notepad to see what the invisible characters are.

show-clipboard.ahk
#Persistent

OnClipboardChange:

  x := Clipboard
  n := StrLen(x)
  out := ""

  Loop, %n%
  {
    c := SubStr(x, A_Index, 1)
    a := Asc(c)
    if( a < 32 )
    {
      b := Chr(64 + a)
      out := out . "^" . b
      if( a = 10 )
      {
        out := out . "`r`n"
      }
    }
    else
    {
      out := out . c
    }
  }

  TrayTip ,Clipboard contents,%out%

return

Modify clipboard

This AutoHotkey script will remove any CR CR LF sequences and other stray CR and LF characters from the clipboard whenever you press Ctrl+C or Ctrl+X in Notepad. Only CR LF sequences will remain unchanged.

This script may be helpful if you often use Notepad with word wrap on, and want to copy long lines without the CR CR LF characters.

(Update 11/14/2010: If you are familiar with a hex editor, you can instead modify one bit in notepad.exe to prevent Notepad from inserting CR CR LF characters at wrap points.)

modify-clipboard.ahk
#IfWinActive ahk_class Notepad

^c::
  Send ^{Ins}
  ClipWait
  Gosub CorrectClip
return

^x::
  Send +{Del}
  ClipWait
  Gosub CorrectClip
return

#IfWinActive


CorrectClip:

  x := Clipboard
  n := StrLen(x)
  out := ""
  cr_count := 0

  Loop, %n%
  {
    c := SubStr(x, A_Index, 1)
    a := Asc(c)
    if( a = 13 )
    {
      ++cr_count
    }
    else if( a = 10 )
    {
      if( cr_count = 1 )
      {
        out := out . "`r`n"
      }
      cr_count := 0
    }
    else
    {
      out := out . c
      cr_count := 0
    }
  }

  Clipboard := out

return

Links

Fix Notepad's CR CR LF bug with a hex editor
bavih.blogspot.com/2010/11/fix-notepad-bug.html
Follow up post that shows how to modify notepad.exe with a hex editor to prevent Notepad from inserting CR CR LF characters at wrap points.

AutoHotkey
www.autohotkey.com
A keyboard shortcut, macro, and script processor for Windows.

Revised 9/14/2009: Added clarifications in title and introduction that CR CR LF characters are only inserted in the display window.
Update 1/24/2010: Updated Modify Clipboard script.
Update 11/14/2010: Added links to follow up post that shows how to modify Notepad with a hex editor to prevent the bug.
Revised 7/24/2011: In step 7, changed "two characters after a wrap point" to "two characters after the first wrap point in a paragraph".