Troubleshoot environment issues

  • Updated

For clients that will be hosting their own Optimizely Configured Commerce instance, we suggest that they review some of our best practices and guidance relative to managing the site. While we may call out certain things, such as server and database health, we are not making specific recommendations in these areas. Rather, we are focusing on those areas that are most meaningful to managing a Configured Commerce site.

This is not a comprehensive guide, but will touch on those concerns that are most common:

Troubleshoot DNS issues

1. Use an external service like to see if your DNS resolves

2. Check on the status of your DNS provider. If it is a large provider like network solutions, a twitter search can show you if others are experiencing issues with the same provider.

Troubleshoot application-level issues

Try to load simpleping.aspx directly on one of the web servers. This should display a detailed .net error page instead of the friendly error page.

  1. RDP directly to the server
  2. Open a browser window
  3. Enter localhost/simpleping.aspx
    • You may need to use a custom port such as localhost:[portnumber]/simpleping.aspx if you have more than a single website set up in IIS
  4. If you are not getting a response, there may be an IIS or server-level issue.
    1. First try recycling the application pool
    2. Next try restarting IIS
    3. Finally try restarting the web server
  5. If you see a .NET error page then the following error messages will indicate an issue with session:
    1. If the following text is included in a .NET error page, a SQL session error is indicated: "Unable to connect to SQL Server session database."
      This usually indicates a database issue, following the steps in troubleshooting database issues. Note that the connection information for the session connection is located at ~/config/sessionState.config on the webserver and may differ from the regular connection string information at ~/config/connectionStrings.config
      The location of Session SQL errors is highlighted in the screenshot below. To access these logs you will need to log into the admin panel and then add /elmah to the end of your request like this: [website URL]/elmah.

    2. The error text below indicates an issue with session state windows service, and also includes instructions for troubleshooting

    Unable to make the session state request to the session state server. Please ensure that the ASP.NET State service is started and that the client and server ports are the same. If the server is on a remote machine, please ensure that it accepts remote requests by checking the value of HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Services\aspnet_state\Parameters\AllowRemoteConnection. If the server is on the local machine, and if the before mentioned registry value does not exist or is set to 0, then the state server connection string must use either 'localhost' or '' as the server name.

  6. If you see a message "The WIS running on the server [ServerName] has not connected since [DateTime] then the issue is with WIS, see troubleshooting WIS issues.

    This stores history for all WIS connections. If you rename a server, or uninstall WIS from a server the simpleping.aspx page will fail shortly after that. You will need to run the following command against your database:
    DELETE FROM RC_IntegrationConnectionHistory

  7. Use the troubleshooting mode of simpleping.aspx to get information, using the same steps as above modify the url to be localhost/simpleping.aspx?troubleshooting=true
    1. Failure querying the sql server database - this text will display followed by an exception message if the issue is with querying the database. The exception message may provide more information. See troubleshooting database issues.
    2. Failure querying the Lucene index - this text will display followed by an exception message if there is an issue querying the Lucene index. These problems can usually be resolved by rebuilding the Lucene index in the Admin Console. These problems usually leave the site in a semi-functional state, but are monitored by simpleping because products and orders rely on Lucene so it is considered critical to the site health.

Troubleshooting Database Issues

If the application cannot connect to the SQL server, the following steps would be used to try and ascertain what might be happening. In general, do not use SQL Server Management Studio (SSMS) directly on the SQL Server.

  1. Ping the SQL server directly from within the web server to rule out a network issue
    1. If there is no response, one needs to determine if this is a network issue or a server dead condition
  2. Run SQL Server Management Studio and see if you can connect to the SQL Server
    1. First get the connection
    2. Run a simple query on the Configured Commerce database such as "Select count(*) from website (NOLOCK)" to see if you get a result
    3. If you cannot connect or get a result,move on to the next step
  3. Remote to the SQL server itself to ensure it is up and running and that CPU is not pegged
    1. If CPU is pegged, attempt to find the process that is pegging CPU and kill it

      1. Use SQL Activity Monitor on the database (from SQL Management Studio)O and inspect processes that may be in a "runaway" condition

      2. Note that it is typical to have memory taking up most of the available memory on the server - this is likely not the problem (> 98% may indicate a problem)

      3. Kill selected processes to see if you can restore the SQL Server to normal activity levels

    2. If SQL Server is not accessible, it may require a full reboot at the data center
  4. If everything looks ok on the server using SSMS, then check that the user/password in the connection string is valid
    1. Go to the web application root folder/config/connectionstrings.config
    2. Visually validate that the server and login looks correct
    3. If using integrated security, make sure that the password for the account did not expire

      1. The user is based on the IIS application pool for the Configured Commerce web application

        1. Log onto the web server
        2. Launch IIS Manager and note the identity associated with the application pool
    4. If not using integrated security, try to log into the server using SSMS with the username and password shown in the config

  5. If the errors subside and recur on an occasional basis, make sure that the database itself is healthy (see Database Health & Maintenance)


Configured Commerce Servers

Logs are stored in the following locations:

  • The Elmah_Error database table contains detailed information about any exceptions that occur in the Configured Commerce application.
  • The ApplicationLog database table contains details for debugging, information, errors, and warnings. The level is indicated by the Type column. The module that generated the error is included in the Source column.
  • ~/App_Log/logfile.txt contains nearly the same information as the ApplicationLog database table. Each web server needs to have a separate file. If running in a load balanced environment ensure ~/App_Log is not set up with DFS. This file is not configured out-of-the-box. To enable it, configure it to turn on from the�~/App_Config/log4net.config file.�See log4net documentation for more details.

WIS Servers

The WIS will log to [InstalledLocation]/logfile.txt. This file will include info, debug, warning and error messages. Some examples of log types.

  • Info
    • Jobs starting and completing will have an entry
    • Starting the service will log some information about integration processors that are loaded
  • Debug
    • Usually debug messages added to integration jobs to aid developer in tracking down problems
  • Error
    • Usually indicates a problem running an integration job or a problem connecting to the Configured Commerce web service. Details of common connection problems exist in the Troubleshooting WIS section.

General Server Health

It is important to ensure that your overall environment is running smoothly and you should have at least the following processes and monitors in place:

  • CPU not pegging
  • Disk space not running out
  • Memory not swapping too much
  • Patching servers regularly
  • Intrusion detection in place
  • No single point of failure if possible

Database Maintenance

There are some general things that should be watched regarding the SQL server such as:

  • Ensuring adequate disk space
  • Tables and indexes are not becoming fragmented
  • Backups are being performed regularly
  • The Nightly Maintenance plan is being run to keep the database cleaned up. This is implemented with an integration job called "Nightly Maintenance" that should be scheduled to run nightly.

WIS Monitoring

Active WIS servers are being monitored by the simpleping.aspx page. Each time a WIS server connects, the Configured Commerce site tracks the connection. If a WIS server that has previously connected does not have a connection in the last 10 minutes simpleping considers that WIS server down and will fail the health check.

Monitoring a specific WIS can be accomplished by ensuring the windows service "Commerce Integration Service V3.7.x" is started and running on a server. Even if the service is running it may not be successfully connecting and processing jobs. The simpleping.aspx check validates each WIS is connecting.

WIS Troubleshooting

If Simpleping has indicated an issue with WIS services, follow the steps below to troubleshoot it:

  1. RDP into the WIS server
  2. Ensure the windows service "Commerce Integration Service V3.7.x" is running
    1. If the service is not running, right click it and click start
    2. If it fails to start, check the log file at [WIS installation path]/logfile.txt
      1. Scroll to the bottom of the log to find the newest messages
      1. The error message below indicates that the user/password combination is not validating properly in the Configured Commerce database. Double check the [WIS installation path]/siteconnections.config file for the values and ensure that user exists in Configured Commerce, has the ISC_Integration role, has the correct password, and is approved and not locked out

      System.ServiceModel.FaultException`1[System.ServiceModel.ExceptionDetail]: Invalid UserName or Password (Fault Detail is equal to An ExceptionDetail, likely created by IncludeExceptionDetailInFaults=true, whose value is:

      System.Exception: Invalid UserName or Password

      1. The error message blow indicates that the WIS server has not been allowed to connect. In Configured Commerce browse to Global Settings - Application Settings - and ensure the server name is included in a comma delimited list in the setting named ERP_IntegrationServiceAllowedMachines.

      System.ServiceModel.FaultException`1[System.ServiceModel.ExceptionDetail]: Machine [ServerName] is not explicitly allowed to connect. See application setting ERP_IntegrationServiceAllowedMachines. (Fault Detail is equal to An ExceptionDetail, likely created by IncludeExceptionDetailInFaults=true, whose value is:

      System.Exception: Machine [ServerName] is not explicitly allowed to connect. See application setting ERP_IntegrationServiceAllowedMachines.

      1. Some error message may indicate that WIS cannot connect to Configured Commerce due to network issues. On the WIS server use a browser to ensure that you can access [SiteUrl]/integration/integration.svc, if that fails there may be a network issue between WIS and your installation of Configured Commerce, or there may be an application level issue with Configured Commerce.
      2. The error message below indicates a mismatch on http vs https.

      System.ServiceModel.EndpointNotFoundException: There was no endpoint listening at [EndPoint] that could accept the message.

      For the above line, note that the following comment exists within the web.config file:

      <!-- Comment out security node to run integration service over http, uncomment for https -->
      <!--<security mode="Transport"><transport clientCredentialType="None" /></security>-->