Error in RegEx (atlrx.h) in Visual Studio C++ 2003 and 2005

During our development we found that Microsoft RegEx (Regular expression) implementation contains a bug which caused crashes of our applications. The application module using RegEx passed all unit test, but sometimes under heavy usage the application crashed at our customer. Because we use BugTrap for error and crash notifications, we knew the error was in atlrx.h file.

After several hours of testing and searching we found the bug. The crash didn’t occur after first code execution, but we had to run thousand iterations of the same code over and over. The bug is located in file atlrx.h at line 708.

Original file looks like this:

  case RE_ADVANCE:
    sz = CharTraits::Next(szCurrInput);
    szCurrInput = sz;
    if ( sz == NULL || *sz == '\0')
      goto Error;
    ip = 0;
    pContext->m_nTos = 0;
    break;

Problem is, that variable szCurrInput have in some circumstances NULL value and this causes the crashes.

Updated file with bug fix:

  case RE_ADVANCE:
    if( szCurrInput == NULL || *szCurrInput == '\0' )
      goto Error;
    sz = CharTraits::Next(szCurrInput);
    szCurrInput = sz;
    if ( sz == NULL || *sz == '\0')
      goto Error;
    ip = 0;
    pContext->m_nTos = 0;
    break;

We change the first two lines. It is necessary to test szCurrInput variable for NULL and empty string value. If szCurrInput is NULL or empty string, it’s necessary to stop processing RegEx. Otherwise stack overflow during processing string occurs.

Note

Some time later we had other problems with Microsoft RegEx implementation and non-standard RegEx syntax. So we left MS RegEx parser and moved to Boost.Regex which is really nice piece of code (as well as other libraries of the Boost pack) and supports Perl and POSIX regular expressions. Whole Boost library is carefully unit test and can be relied on.