TPH production

How to Diagnose a Failed TPH Payment

When a payment stops in TPH, the difference between fixing it in ten minutes and spending three hours opening every screen you can think of is knowing where to look first, and why — in that order, without guessing.

The moment a payment failure is declared, something interesting happens to the diagnostic process. Everyone wants to help. Several people open screens. Someone starts checking the queue. Someone else checks a different queue. Someone opens the message log and scrolls to the bottom, which is usually the wrong end. Someone asks whether it has anything to do with the COB that ran forty minutes late, and everyone pauses because that is actually a good question, but nobody knows how to answer it without opening three more screens.

The problem is not a lack of tools. TPH has excellent tools. The problem is that the tools are organised by function, not by diagnostic sequence. The repair queue screen tells you the payment is in repair, which you already knew. The message log tells you the message was received, which you already assumed. The enquiry tells you the payment exists, which was never in doubt. What you need is the question nobody asks first: where did it actually stop?

This article is the diagnostic sequence that answers that question — in order, with the screens that matter and the ones you can skip.

The diagnostic sequence

There are seven places a payment can stop, and each produces different symptoms. The fastest diagnosis is to eliminate them in order — from outside TPH inward. The reason for this direction is that external failures are both the most common and the easiest to confirm. If you start by checking the core posting, which is the least common and the hardest to check, you will spend thirty minutes ruling things out while the answer is sitting in the first screen you should have opened.

Step 1: Did the message actually arrive?

This sounds simple. It is simple. And yet a surprising number of payment investigations begin with the assumption that the message arrived and then go looking for why TPH did not process it, only to discover — eventually, after eliminating everything else — that the message never reached TPH at all.

Start with the message channel. If the payment is inbound from SWIFT, check whether the SWIFT message was received. If it came via a file, check whether the file arrived in the pickup directory. If it came from an upstream system via OFS or API, check whether the interface logged the request. This is not a TPH question. It is a question about the boundary between TPH and everything outside it. Answer it first.

If the message did not arrive, the problem is not in TPH and opening TPH screens will tell you nothing useful. If it did arrive, you now know the problem is somewhere between message receipt and payment settlement, which is the territory the rest of this article covers.

One specific thing worth checking here: the TPH messaging framework has its own message monitoring. The PP.MESSAGE.MONITOR enquiry will show you whether the message was received by TPH's inbound framework, regardless of what happened next. If the message is not there, it was not received. If it is there, note the status and move to step 2.

Step 2: Is it in a repair queue, and if so, which one?

When a message arrives but TPH cannot process it automatically, it goes to a repair queue. The key thing to understand is that there are three repair queues, not one, and they handle different categories of problem. Looking in the wrong queue is a slower version of looking in the right one, and the wrong one will be empty, which is reassuring in a way that is not actually helpful.

File level repair — the file could not be recognised or parsed. The entire file is held, not individual payments. This happens when the message format does not match any configured inbound mapping, or when the file structure is corrupted. If the file was received but nothing appeared in any queue, this is the queue to check.

Bulk level repair — the file was recognised but the bulk within it could not be processed. This happens when the bulk header contains invalid data or when the bulk references a product or channel TPH does not recognise.

Transaction level repair — the individual payment failed a business validation. This is the most common category and the one most people mean when they say “the payment is in repair.” The error code tells you why. Learning to read error codes without looking them up is the single biggest accelerator for TPH diagnostic speed. BAC20002 means insufficient funds. Not every error code is BAC20002, but enough of them are that you should check it first, before assuming the payment has a deeper problem.

The three repair queues are accessed through different enquiries. If you only know about one of them, you are missing two thirds of the diagnostic surface. Learn all three. The night you need the file-level repair queue is not the night to discover it exists.

Step 3: What does the payment status actually mean?

TPH payment status codes are not random numbers. Each one describes a specific position in the payment lifecycle. The ones worth knowing immediately:

  • Status 19 — Warehoused. The payment is valid and waiting for its execution date. This is not a failure. This is the system working correctly. The payment will be released automatically when the execution date arrives. If the execution date was supposed to be today and the payment is at status 19, the date on the payment is wrong — check what the originating system sent.
  • Status 99 — Repair. The payment failed validation and is waiting for manual intervention. The error code tells you why. Fix the problem, release the payment, and it will continue through the lifecycle from where it stopped.
  • Status 999 — Completed. The payment has been processed and posted. If the beneficiary says they have not received it, the problem is after TPH — in the clearing, the correspondent bank, or the beneficiary's own bank.
  • Status 1 — Parked. The payment is held, usually because an external dependency did not respond in time. This is different from repair — the payment is valid, but something outside TPH (a sanctions screening system, a posting confirmation, a clearing acknowledgement) has not come back. The parked queue is where payments wait while TPH decides whether to retry or escalate.

The status alone narrows the diagnostic path significantly. A payment at status 19 with today's date is a data problem. A payment at status 99 is a validation problem. A payment at status 1 is an external dependency problem. Each leads to a different screen, a different investigation, and a different resolution. Checking the status before doing anything else saves you from investigating the wrong category of problem.

Step 4: Did a business validation fail?

If the payment is in repair with a validation error code, the next question is: which validation? TPH runs multiple validation layers, and they fire in a specific order. Understanding this order means you can often predict which validation will fail before the payment even reaches it.

Message validation — is the message structurally valid? Required fields present? Formats correct? This fails early and the error messages are usually clear. Missing beneficiary BIC. Invalid currency code. Mandatory field not populated.

Account validation — does the debit account exist, is it active, and does it belong to the customer initiating the payment? These validations check the core banking system, which means they can fail for reasons that have nothing to do with the payment itself — a frozen account, a closed account, an account that was migrated incorrectly.

Balance validation — is there enough money? This is BAC20002 territory. Check the available balance, not the ledger balance. The difference between them is the reason some payments fail balance validation even when the account looks fine in the customer information screen.

Limit validation — does the payment amount exceed configured limits? These limits can be per customer, per account, per product, or per channel. When a payment that worked yesterday fails today, check whether a limit was changed. Limits are the kind of configuration that someone adjusts during business hours and forgets to mention.

Sanctions validation — did the screening system flag the payment? If the screening system is slow or unavailable, the payment may have timed out waiting for a response. Check the screening system status before assuming the payment itself is the problem.

The diagnostic move here is to identify which validation layer failed and then focus the investigation on that layer specifically. If the payment failed balance validation, checking the message format and the sanctions status is a waste of time. Find the error code, map it to the validation layer, and investigate only that layer.

Step 5: Did product determination route it wrong?

Product Determination is the engine that decides how a payment is processed. It checks the payment's attributes — currency, amount, geography, payment type, priority — against a set of configured product conditions and finds the best match. That match defines the routing, the posting, and the fees.

When product determination goes wrong, it goes wrong silently. The payment processes. It posts. The status goes to 999 — completed. And the money went to the wrong correspondent, or through the wrong clearing, or with the wrong charge code applied. There is no error. There is no repair queue entry. The system did exactly what the product conditions told it to do. The product conditions are just wrong.

The diagnostic signs of a product determination problem are subtle:

  • The payment completed but went through an unexpected channel
  • Multiple payments with similar attributes are all routing incorrectly
  • The routing changed on a specific date with no corresponding change request
  • The fees or charges applied do not match what the product documentation says

To diagnose: check the product conditions for the payment type in question. Look at the matching criteria — currency ranges, amount thresholds, geography rules. If two product conditions have overlapping criteria, TPH will match the more specific one. If someone added a new product condition that is slightly more specific than the existing one, payments that used to match condition A now match condition B, and nobody noticed because both conditions looked correct in isolation.

Product condition overlap is one of the most enjoyable production problems to inherit. The person who created the overlapping condition has usually left the project. The documentation does not mention it. The routing has been wrong for weeks and the only reason anyone noticed is that a correspondent bank sent a politely worded query about unusual volumes.

Step 6: Did an external dependency time out?

TPH depends on external systems: sanctions screening, clearing gateways, correspondent connections, and — in standalone deployments — the core banking system for posting confirmations. Each of these dependencies can be slow or unavailable, and each failure produces different symptoms.

Sanctions screening timeout — the payment reaches the screening step and does not proceed. TPH will retry, but if the screening system is genuinely unavailable, all payments will queue behind the same timeout. The diagnostic sign: multiple payments stuck at the screening step simultaneously. The fix is not in TPH. It is in the screening system.

Clearing gateway timeout — the payment was released to the clearing but no acknowledgement was received. TPH may retry, park the payment, or mark it as sent and wait. The diagnostic sign: payments with a status indicating they were sent but no confirmation received. Check the clearing gateway status before assuming TPH did not send the payment.

Posting confirmation timeout — in a standalone deployment, TPH sends a posting instruction to the external DDA and waits for confirmation. If confirmation does not arrive, the payment goes to the parked queue. The diagnostic sign: payments stuck in parked status with the posting step as the last completed action.

The diagnostic move for external dependency problems is to check the dependency status before investigating the payment. If the clearing gateway is down, every payment trying to use it will fail. Diagnosing the payment without checking the gateway is like diagnosing a car that will not start without checking whether it has petrol.

Step 7: Did the posting fail?

Posting is the last step in the payment lifecycle. The accounting entries are generated and sent to the core banking system. If everything else has worked — the message arrived, the validations passed, the product determination routed correctly, the clearing acknowledged — and the payment still appears incomplete, the posting is the place to look.

Posting failures are rare compared to validation failures, but they happen often enough to justify a specific diagnostic step. The common causes:

  • The core banking system rejected the accounting entries — a suspended account, a closed period, a currency configuration mismatch
  • The posting instruction was generated but never sent — a middleware failure between TPH and the core system
  • The posting was sent and acknowledged but the status was not updated — a rare synchronisation gap that makes the payment look incomplete when it is actually fine

Check the core system for the accounting entries before assuming they were not posted. The payment status in TPH and the actual ledger entries in the core system do not always agree, and when they disagree, the core system is usually right.

The diagnostic cheat sheet

When you are on a call and everyone is waiting, work through this in order:

  1. Did the message arrive? — Check PP.MESSAGE.MONITOR. If not present, problem is upstream.
  2. Is it in a repair queue? — Check all three queues. Read the error code. BAC20002 means balance.
  3. What is the payment status? — 19 = warehoused (check the date). 99 = repair (fix and release). 1 = parked (check external dependency). 999 = completed (the problem is after TPH).
  4. Which validation failed? — Message, account, balance, limit, or sanctions. Focus investigation on that layer only.
  5. Did product determination route it wrong? — Check for overlapping product conditions. Silent failure, no error.
  6. Did an external dependency time out? — Check sanctions, clearing, and posting confirmation statuses before investigating the payment.
  7. Did the posting fail? — Check the core system for the actual accounting entries.

Most payment failures are resolved by step 4. The later steps exist for the ones that are not. Knowing the sequence means you do not start at step 7 and work backwards, which takes considerably longer and involves considerably more people asking for updates.

What experienced people do differently

The experienced TPH support person does not open screens faster. They open fewer screens — the right ones, in the right order, and they know what each screen is actually telling them. The difference is not speed. It is sequence.

The other thing experienced people do is notice patterns. When the third payment this week fails at the same step with the same error code, they stop diagnosing individual payments and start looking at what changed. A configuration change. A new product condition. A clearing gateway that has been gradually slowing down. The experienced person knows that repeated failures are not repeated accidents. They are symptoms of something that has not been found yet.

And when they do escalate — because sometimes you have to — they can say exactly what they checked, in what order, and what each check told them. That conversation takes two minutes instead of twenty, and the person being called can immediately do something useful. This is the difference between being the person who escalates well and the person people brace slightly when they see calling — a subject covered elsewhere in considerably more detail.

Related reading

TPH fundamentals

What TPH Actually Does and Why It Is Hard to Learn

The moment TPH becomes real is usually around 2am during a production incident. Payments are not going out. Thirty people are on a call. This is what you actually need to know before that call happens.

Production support

Five Things to Check Before Escalating a T24 Incident

The difference between a junior analyst and an experienced one is often ten minutes and a short checklist. Five questions to answer before you pick up the phone.

TAFJ operations

TAFJ Log Files: Where to Look When Something Goes Wrong

TAFJ distributes its logs across six different locations maintained by different parts of the stack. Where each one lives, what it covers, and the order to check them in — before an incident makes the question urgent.