One thing which I see happening quite often on number of different projects is that developers have very limited access to the production servers. Actually in some cases they don’t have access at all. That causes number of problems, especially when there are integration points with other systems.
Of course as long as application is up and running lack of access is not a problem. But when something is wrong, developers are supposed to support the application and fix the ongoing problems. Usually in scenarios like this, when system on production is not fully functional, there is a lot of pressure to nail down the problem and fix it quickly. Developers without access are literally blind hence emails like ‘can we get remote desktop to production servers?’ between them, management and server admins are normal. If in the end developers will get a remote desktop access they can consider themself being lucky. Typically server admins and security policies are an impassable wall.
So what can development team do to put themselves in a better position?
Here are a few ideas:
- There should be a diagnostics tool for each integration point. Especially if an application relay on 3rd party systems. Developers should have a way to check if 3rd party system is responding and if it’s responding in reasonable time. Another useful feature is to have ability to invoke calls on 3rd party system with some test data and verify if results are correct.
- Each call to external systems should be logged and developers should have an option to generate a report with all the calls. It’s a great help when there is a need to investigate which system doesn’t work as expected. As long as developers can show evidence that for instance the applications asked 3rd party system to send x emails and only subset of them got delivered then it is clear where the problem is.
- There should be a way to gather data regarding server’s performance. With time application most likely will be more and more popular which means increased load. Also amount of data stored within application will grow. It’s essential to monitor if servers are dealing with current level of traffic and data.
- All unexpected errors should be logged and easily accessible. It again means that there should be an easy way to generate report with all unexpected errors. (like 500 error pages) I have seen a few times that information about unexpected errors were logged but they were buried somewhere deep in 1GB log files among other completely irrelevant information. In such cases developers can say that this information is logged but can they easily find it? Can they easily get all log files? Can they easily access all the servers?
Having access to all those data, developers can perform investigation and in reasonable time identify a few potential culprits. Moreover, on top of that layer they can build automatic monitoring system which can do most of the job for them and in case of unexpected errors for instance send emails to relevant people.
But to have all of that in place developers and also management have to realize that having diagnostics tools in place is not a strange requirement; it’s their duty as a professionals to foresee the problems and in future be able to tell what is going on with their application in any environment.