Infrastructure
Containerization and Cloud for T24
T24 has historically run on bare metal or virtual machines with JBoss and a relational database. That is changing. Banks are wrapping TAFJ in Docker containers, orchestrating with Kubernetes, and migrating workloads to AWS, Azure, and Temenos Cloud. This article covers what that actually looks like — and what breaks when you try it.
If you have worked with T24 for more than a few years, you have a mental model of what a T24 environment looks like. It is a server — physical or virtual — running a Linux operating system, a JBoss application server, an Oracle or MSSQL database, and a TAFJ runtime that has been configured by someone who left the bank three years ago and whose documentation is a single text file called NOTES.txt.
That model is not wrong. It is just increasingly outdated. The industry is moving toward containerized deployments, cloud infrastructure, and orchestrated environments where TAFJ instances can be scaled up and down like any other microservice. This article explains what that means in practice — not as a vendor slide deck, but as something you might actually have to build, probably while someone asks you why it is taking so long.
Why containerize T24 at all?
The traditional T24 deployment model works. Banks have been running it for decades. But it has problems that containers solve:
- Environment drift. Every T24 environment is slightly different. The production server has patches that the UAT server does not. The DR server has a different JBoss version. The development server has been restarted so many times that nobody remembers what the original configuration looked like. Containers eliminate this by packaging the entire runtime — OS libraries, Java version, TAFJ binaries, configuration files — into a single immutable image. What you test is what you deploy. Revolutionary concept, really.
- Scaling. In the traditional model, scaling T24 means provisioning a new VM, installing JBoss, configuring TAFJ, and hoping the new instance behaves like the existing ones. With containers, you spin up another replica from the same image. It takes seconds, not days. It also does not require a change request, a CAB approval, and a three-week lead time.
- Disaster recovery. Traditional DR for T24 involves maintaining a separate physical or virtual environment that is kept in sync through database replication and file synchronization. With containers, DR is a matter of deploying the same image to a different cluster. The infrastructure becomes reproducible. You can test DR without needing to book a weekend and explain to your manager why you need access to a data centre in another country.
- CI/CD. Deploying a T24 change in the traditional model means copying files to the server, running compilation scripts, and hoping nothing breaks. With containers, you build a new image, run it through a pipeline, and deploy it to production with a rollback strategy built in. The rollback strategy is important because something will break, and when it does, you want to be able to go back to the previous version without having to explain to a room full of people why the deployment failed.
Docker and TAFJ — what actually works
TAFJ is a Java-based runtime, which makes it surprisingly container-friendly. A typical TAFJ Docker image includes:
- A base image with the required Linux libraries (glibc, libstdc++, etc.)
- The JDK version that matches the TAFJ release
- The TAFJ runtime binaries and license files
- The application code (compiled .class files and DICT definitions)
- Configuration files (TAFJ_CONFIG, log4j properties, connection pools)
- Startup scripts that initialize the runtime and register with the cluster
The tricky part is state. TAFJ instances are stateless in theory — the database holds all the persistent data — but in practice, TAFJ relies on local file system state for things like temporary files, compiled routines, and log output. A containerized TAFJ deployment needs to handle this carefully, which is a polite way of saying "you will discover things that break in ways you did not expect":
- Compiled routines should be baked into the image, not compiled at runtime. This means your CI pipeline needs to run tCompile and tIntegrate as part of the image build process. If you forget this step, your container will start, fail to find the compiled routines, and produce an error that looks like a missing file but is actually a missing build step. You will spend an hour debugging this. Everyone does.
- Log files should be written to stdout/stderr (for container logging) rather than to local files. This requires changing the log4j configuration to use a console appender instead of a file appender. If you forget this step, your logs will be written to a file inside the container, and when the container is replaced, the logs disappear. This is fine until you need to debug something and realise the logs from the failed container no longer exist.
- Session state should be externalized. If a container is terminated and replaced, any in-memory session data is lost. This is usually fine for T24, where session state is minimal, but it matters for long-running batch processes. Killing a TAFJ pod mid-COB is a special kind of problem that you do not want to experience on a Sunday night.
Kubernetes — orchestrating TAFJ at scale
Once you have TAFJ running in a container, the next question is how to manage multiple instances. Kubernetes is the standard answer, and it works well for T24 if you design the deployment correctly. If you design it incorrectly, it works poorly, and you get to learn what "CrashLoopBackOff" means at 2am.
A typical T24-on-Kubernetes deployment includes:
- Deployments for the TAFJ runtime instances, configured with horizontal pod autoscaling based on CPU or request queue depth.
- Services for internal communication between TAFJ instances and between TAFJ and the database.
- Ingress for external access — OFS requests, web services, and user sessions coming through a load balancer.
- ConfigMaps and Secrets for configuration that changes between environments — database connection strings, license keys, and integration endpoints.
- PersistentVolumeClaims for any state that must survive pod restarts — typically the database data files and any shared file system mounts.
The most common mistake in T24-on-Kubernetes deployments is treating TAFJ like a stateless microservice. TAFJ is not a microservice. It is a monolithic runtime that happens to run in a container. The deployment strategy needs to account for startup time (TAFJ can take several minutes to initialize, which means your readiness probe needs to be patient), session affinity (requests from the same user should go to the same instance, which means you cannot just round-robin everything), and graceful shutdown (killing a TAFJ pod mid-transaction can leave the database in an inconsistent state, which means your preStop hook needs to actually work).
Cloud providers — AWS, Azure, and Temenos Cloud
Every major cloud provider has customers running T24. The approach varies depending on the provider and the bank's regulatory requirements. The infrastructure is the easy part. The regulatory approval is the hard part.
AWS.
AWS is the most common cloud provider for T24 deployments. The typical architecture uses EC2 for compute (either directly or through EKS for Kubernetes), RDS for the database (Oracle or MSSQL), and S3 for backup and archival storage. Banks that have gone through the AWS migration process report that the infrastructure itself is straightforward — the hard part is the regulatory approval, the network security group configuration, and the data residency requirements. Also, explaining to your cloud architect why T24 needs a certain amount of memory and CPU, and why "but the documentation says" is not a valid answer.
Azure.
Azure is the second most common choice, particularly for banks that already have a Microsoft enterprise agreement. The architecture is similar — AKS for Kubernetes, Azure SQL for the database, Blob Storage for backups. Azure has an advantage in hybrid deployments where some workloads remain on-premises while others move to the cloud, because Azure's networking and identity integration with on-premises Active Directory is more mature than AWS's. This matters more than you think, because your bank's Active Directory environment has been accumulating users since 2005 and nobody wants to migrate it.
Temenos Cloud.
Temenos offers its own cloud platform, which is essentially a managed T24 environment running on Temenos's infrastructure. The advantage is that Temenos handles the infrastructure, the patching, and the upgrades. The disadvantage is that you have less control over the environment, and the pricing model is different from running T24 on a general-purpose cloud provider. Temenos Cloud is a good option for banks that want to offload infrastructure management entirely, but it is not necessarily cheaper than running T24 on AWS or Azure yourself. It is also not necessarily more expensive. It is just different, and "different" is a word that makes procurement teams nervous.
What to watch out for
Containerizing T24 and moving it to the cloud is feasible, but there are traps that catch teams who have not done it before. Here are the ones that show up in post-incident reviews most often:
Licensing.
T24 licensing is traditionally based on the number of cores or the processing capacity of the server. In a containerized environment where instances can scale up and down automatically, the licensing model becomes ambiguous. Some Temenos licensing agreements explicitly restrict running T24 in containers or on shared infrastructure. Check your license agreement before you start. The phrase "we did not realise the license restricted that" has been the opening line of several uncomfortable conversations between banks and Temenos account managers.
Network latency.
T24 was designed for a world where the application server and the database server are in the same data center, connected by a low-latency network. In a cloud deployment, the database might be in a different availability zone or even a different region. The additional latency can cause performance issues, particularly for batch processes that make thousands of database calls per second. Your COB that used to take two hours now takes four. Your manager will ask why. You will say "network latency." Your manager will look at you the way people look at you when they think you are making excuses.
Regulatory approval.
Many banking regulators require that core banking systems run on infrastructure that is physically located in the country where the bank operates. Cloud providers offer local regions, but not every region has every service. If your bank operates in a country with strict data residency requirements, you may need to use a specific cloud region that does not support all the services you want to use. This means you get to explain to your cloud architect why you need a service that is not available in the region you are allowed to use, and the conversation will go in circles until someone escalates it.
The people problem.
The team that knows how to run T24 on bare metal is not the same team that knows how to run Kubernetes. Banks that migrate to the cloud without investing in training or hiring find themselves with a modern infrastructure that nobody knows how to operate. The Kubernetes cluster runs. The containers deploy. But when something breaks at 2am, the on-call engineer has never seen a Kubernetes pod status before and does not know what "CrashLoopBackOff" means. They will call someone who does know, but that someone is on holiday, and now you have a production incident and a learning opportunity at the same time.
The bottom line
Containerization and cloud migration for T24 is not a theoretical exercise. Banks are doing it today. The technology works — Docker containers for TAFJ are stable, Kubernetes orchestration is mature, and cloud providers have reference architectures for T24 deployments.
But it is not simple. The licensing, the network architecture, the regulatory requirements, and the skills gap all need to be addressed. The banks that succeed are the ones that treat the migration as a multi-year program, not a one-quarter project. The banks that fail are the ones that try to lift and shift their existing T24 environment into a container without understanding what changes. Do not be the bank that fails. Your 2am self will thank you.