Why you need to test your production environment

It occurred to me lately, after chatting with some people from the testing community, that not everyone runs automated tests or does any kind of testing in the production environment. For me that seems a bit unnatural, since i have been doing it on all the projects that i worked on. So, here are a few thoughts that might convince you that you do need to run automated tests even in production:

when you have no production monitoring: there are those cases when the project you are working on is either very old, or simply no production monitoring or tests are in place. When there is no production monitoring whatsoever, how exactly can you know when your software is not working properly? Usually it’s when customers ring in, complaining about the issues they have using your software. But that late feedback can be avoided by creating an automation suite that will run periodically in your production environment. Having tests that run for the most important user flows will help get an early feedback that there is some kind of mishap in the production environment.
when you have production monitoring: it is very useful to have production monitoring. However you need to keep in mind that the monitoring usually detects hardware issues or load issues. Not functionality issues. In many cases monitoring cannot determine that specific user scenarios are not working. They can tell you that the application is slow, that certain aspects of it will not work (if a dependency/service is down for example), that a user transaction is not going through. But it cannot tell you that the user could not increase the number of products in his shopping cart because of the button not working, or that he could not choose a product to have expedite shipping because that dropdown did not work, and so on. Automated tests for user scenarios can detect issues that users have when trying to use your software and can give you an insight on why certain functionalities are rarely or never used, even if you would expect them to be.
production environment architecture and setup is different from the ones in a test environment: how many times does a test environment correctly reflect the architecture or setup of the production environment? Possibly never. Test environments are usually the worst . Not enough resources allocated, not the same configuration, not the same architecture, and so on. Therefore, many times, tests that run on a test environment, if instructed to run in production would fail. That means a functionality will not work in production, because it was created and tested in an environment different from the one it was really meant for. Just because a piece of software works properly in an environment, that does not guarantee that it will run the same way in another one. Keep in mind there might be all kinds of settings in production that can make your software behave different from the test environments.
number of active users is different from test environments: in a test environment you cannot realistically simulate the amount of users active on your software during normal functioning hours. The load generated by users can also be a factor that can lead to software degradation, so having automated tests running in production under normal and high load can identify issues caused by load.
one of your dependencies performs an announced released: many times the features you push to production depend on other features or libraries. If somebody who works on these dependencies changes something which affects your own features, production tests can detect that your features are not working properly anymore. This is useful when your own dependencies change things without letting you know, and without you having the option to test these changes in a pre-production environment.
if your site is content managed: someone could change some areas of your product without you knowing, through a content management system (CMS) which can lead to broken pages, invalid or missing content. Having your automated tests running periodically will validate if those areas still work properly, without you having to be aware of when the content changes are done.
you might have some flags that need to be set up in production: maybe you want to enable certain features or set some feature properties by enabling a flag somewhere. If you have automated tests running periodically, you can check whether the expected features were indeed enabled or not when they were supposed to be. Also, after enabling the flag, the automated tests will report a failure if for some reason the flag value was modified by mistake to a value it shouldn’t have.
regression when changes that should not affect a feature do affect that feature: sometimes the changes that are done in production are rather small and before releasing you might not think you need to do full regression. Periodically run production automated tests can detect whether deploying these small changes affected the main important flows.

One thing is sure: the set of tests that need to run in production is not the same set of tests that run in the test environments. Production tests are lighter and should cover only the main critical scenarios, not every test scenario you can think of for a given feature. They need to cover at least the happy flows, to make sure that the user can use the site according to its purpose.