Migration

How to Tell If Your T24 Migration Is in Trouble Before It's Too Late

The status report says green. The project team says on track. The go-live date is three months away and nobody has raised a red flag. This is, historically, exactly the point at which you should start asking harder questions.

Every T24 migration project produces status updates. The status updates are, almost without exception, reassuring. Milestones are being met. Testing is progressing. The team is working through the issues list. Go-live remains on track.

This is not a conspiracy. It is a structural feature of how projects report. The people closest to the problems are also the people writing the status reports, and humans are, as a general rule, considerably more optimistic about things they are personally responsible for than a disinterested observer would be. The status report is, in this sense, less a description of reality and more a description of what the project team sincerely hopes reality will become by go-live.

What follows are six signals that tend to indicate a migration is in more trouble than the dashboard suggests. None of them require technical knowledge to spot. All of them can be surfaced by asking a small number of direct questions and paying close attention to the answers.

Signal one: Testing has been done mainly by the people who built it

There is a meaningful difference between a system that has been tested and a system that has been confirmed to do what the people who built it expected it to do. These two things are often mistaken for each other.

In a TAFJ migration, the people who know the environment best are the project team and the technical leads. They are, quite naturally, the people doing most of the testing. They know where the known issues are. They know how to work around the bits that are not quite right yet. They know which test scenarios to run and, in some cases, which ones to avoid because the results are inconvenient.

The operations team — the people who will actually be running this environment at 2am six months after go-live — often arrive late to testing, if they arrive at all before the project is declared complete. They do not know where the workarounds are. They do not know which bits are not quite right yet. They will find out, but they will find out in production.

The question to ask: Has the operations team been testing the new environment themselves, independently of the project team — or have they been observing, signing off, and being briefed? The answer will tell you a great deal about how much of the risk has actually been found versus how much of it is still waiting.

Signal two: Nobody can give you a complete list of the local code

Every T24 environment that has been running for more than a few years contains local modifications. These are custom routines, user exits, and local overrides that were added over time to make the system do things the standard product does not do, or to fix things the standard product does in a way that was deemed inconvenient. They were written by people who were, at the time, solving a real problem. Some of those people are still at the bank. Many of them are not.

Local code is the highest-risk item in a TAFC-to-TAFJ migration. It does not come with documentation. It does not come with tests. It comes with, at best, a comment at the top of the file that says something like “modified for client requirements” and a date from several years ago. In TAFJ, it has to compile, deploy, and behave correctly in a Java runtime environment that is fundamentally different from what it was written for. Some of it will. Some of it will not. The ones that do not will fail in production, usually at the worst possible moment.

The question to ask: How many local modifications exist in the current environment, and for each one, who owns it, what does it do, and has it been individually tested in TAFJ? If the answer to the first part involves any uncertainty, or if “individually tested” turns out to mean “included in the bulk compile,” the risk in this area is not yet understood.

Signal three: The operations team has signed off but not actually operated the environment

Readiness sign-offs are a feature of every migration project. They exist because someone needs to formally declare that a thing is ready before you can officially proceed, and that someone needs to be on record as having done so. This is entirely reasonable from a governance perspective. It is, however, not the same as the operations team actually being ready.

There is a specific test worth applying here. In TAFJ, the Close of Business batch process — COB — runs differently from how it ran in TAFC. The monitoring is different. The log files are in different locations. The restart procedures are different. The things that go wrong are, in some cases, different. An operations team that has been briefed on these differences and shown them during project demonstrations knows about them. An operations team that has actually run a complete COB cycle in the TAFJ environment — from start to finish, themselves, without someone from the project team in the room to help — knows them in a different way entirely.

The question to ask: Has the operations team run a complete COB cycle in the TAFJ environment without assistance from the project team? And have they done it more than once? If the answer is no, or if it turns out that the test environment COB has only been run during project-managed sessions, the operations team is going to be learning on the job after go-live. This is common. It is worth knowing about in advance.

Signal four: The rollback plan was last updated in month two

Every migration project has a rollback plan. It is one of the first documents produced, because it is one of the first things that governance asks for, and producing it early demonstrates that the project is being managed responsibly. It is also, in many cases, the last time anyone seriously thinks about it until someone asks whether it is still current.

A rollback plan written in month two of a twelve-month project was written before the environment was built, before the complexity of the local code was understood, before the interface dependencies were fully mapped, and before the go-live date acquired the weight of executive commitment and contractual obligation that makes the phrase “we might need to roll back” feel, to those who would have to say it, approximately as appealing as announcing a fire drill during a board meeting.

The practical result is that many migration rollback plans are theoretical documents rather than operational ones. They describe what rollback would look like in principle. They do not describe the specific steps, the people responsible for each one, the decision criteria that would trigger it, or — crucially — who has the authority and the standing to make that call when the project manager and the programme sponsor are both on the call saying that they just need a few more hours.

The question to ask: When was the rollback plan last reviewed? Who makes the rollback decision, and what conditions trigger it? Has anyone rehearsed it? A rollback plan that cannot be answered in specific operational terms is a comfort document, not a risk management one.

Signal five: The date is driving readiness, not the other way around

Go-live dates have a way of becoming immovable for reasons that have nothing to do with whether the system is ready. Contractual commitments. Regulatory deadlines. Executive announcements made at a point in the project when optimism was running high and the hard work was mostly theoretical. Board presentations with a slide that has a date on it that has now been distributed to a number of people who are not on the project call.

When a date cannot move, readiness tends to get redefined to fit it. Not through dishonesty — through the entirely human process of deciding that the remaining gaps are manageable, that the untested scenarios are edge cases, that the operations team will be fine, and that the post-go-live support plan will cover anything that comes up. These decisions are sometimes correct. They are sometimes not.

The signal to watch for is not that a date exists — all projects have dates. It is whether the conversation about readiness feels like a genuine assessment or like a process of building a case for a conclusion that has already been reached. If every concern raised in the steering group is met with a mitigation before anyone has had time to think about whether the mitigation is adequate, that is worth noting.

The question to ask: If the readiness assessment showed that the environment was not ready, would that change the go-live date? Who has the authority to make that call, and have they said so explicitly? The answer will tell you whether readiness is being assessed or performed.

Signal six: “All interfaces have been tested” — but at what volume, with what data

Interfaces — the connections between T24 and every other system the bank runs — are among the most common sources of post-go-live incidents in a TAFJ migration. File formats that are subtly different. Timing windows that shift because COB now runs on a slightly different schedule. Character encoding that was never formally specified because it never needed to be. Error handling paths that work correctly when everything is fine and behave unexpectedly when something is not.

The word “tested” covers an enormous range of activities. It can mean that a small number of representative transactions were sent through and the output was eyeballed by the person who wrote the interface. It can mean that a full production-volume day was replayed through the test environment and the output was reconciled line by line against the expected result. These are both “tested.” They are not the same thing.

The question to ask: For each business-critical interface, what volume was it tested at, and how was the output validated? Has it been tested at peak-day volume — not average-day volume? Have the counterparty systems been involved in the testing?If the answers are vague, or if “validated” turns out to mean “it ran without errors,” the interface risk is not yet adequately understood.

The most useful question of all

All of the questions above will produce useful information. But there is one question that tends to produce more honest information than any of them, and it is not a technical question at all.

Find a senior member of the operations team — not the project team, not the technical lead, not the programme manager — and ask them, privately, whether they feel ready. Not whether the system is ready. Whether they feel ready to operate it.

The operations team will sign a readiness checklist if they are asked to, because that is what happens at this stage of every project and they understand what is expected of them. They will, in a private conversation, tell you something considerably closer to the truth. If the honest answer is “we'll manage, but there are things I'm still not confident about,” that is important information. If the answer is “yes, completely,” that is also useful to know.

The purpose of the questions in this article is not to stop migrations from happening. Most of the signals above are present in most migrations to some degree, and most migrations still go ahead and eventually land successfully, albeit occasionally with a difficult few weeks in between. The purpose is to make sure that the people responsible for the decision are making it with accurate information rather than optimistic information, and that when things do go wrong — as some things always do — they were not surprises.