{"id":47503,"date":"2026-03-17T16:20:56","date_gmt":"2026-03-17T13:20:56","guid":{"rendered":"https:\/\/mk.gen.tr\/tidalwave-ai-tops-general-models-in-mortgage-underwriting-accuracy-benchmark\/"},"modified":"2026-03-17T16:20:56","modified_gmt":"2026-03-17T13:20:56","slug":"tidalwave-ai-tops-general-models-in-mortgage-underwriting-accuracy-benchmark","status":"publish","type":"post","link":"https:\/\/mk.gen.tr\/tr\/tidalwave-ai-tops-general-models-in-mortgage-underwriting-accuracy-benchmark\/","title":{"rendered":"Tidalwave AI tops general models in mortgage underwriting accuracy benchmark"},"content":{"rendered":"<p>A joint study from mortgage technology platform <strong><a href=\"https:\/\/www.housingwire.com\/articles\/flat-branch-tidalwave-mortgage-ai\/\">Tidalwave<\/a><\/strong> and researchers at Columbia University found that Tidalwave\u2019s mortgage-trained AI agent produced more accurate answers to loan underwriting questions than a general-purpose large language model.<\/p>\n<p>The benchmarking evaluated Tidalwave\u2019s <a href=\"https:\/\/www.housingwire.com\/articles\/tidalwave-ceo-diane-yu-on-building-an-ai-first-company\/\">SOLO agent<\/a> against Claude 4.5 from <strong>Anthropic<\/strong> on 90 questions commonly asked by loan officers during the mortgage origination process. These included whether payroll matches a stated employer, whether bank statements show <a href=\"https:\/\/www.housingwire.com\/articles\/mba-outlines-buy-now-pay-later-underwriting-concerns-in-fha-letter\/\">buy-now-pay-later<\/a> payments, and whether deposits may come from foreign sources, among other questions.<\/p>\n<p>Overall, Tidalwave\u2019s SOLO recorded 84% accuracy compared with 71% for Claude 4.5, according to the study.<\/p>\n<p>The biggest performance gap was in yes-or-no compliance checks \u2014 the questions used to flag issues such as payroll mismatches, undisclosed debts and suspicious transactions. Tidalwave\u2019s SOLO scored 95% accuracy in that category, compared with 42% for the baseline model.<\/p>\n<p>Transaction identification results were closer, with SOLO scoring 83% compared with 80% for Claude 4.5. <\/p>\n<p>On account verification questions, however, Claude 4.5 outperformed SOLO, scoring 86% compared with 67% for Tidalwave\u2019s system. <\/p>\n<p><a href=\"https:\/\/www.housingwire.com\/winner-profile\/2025-woman-of-influence-diane-yu\/\">Diane Yu<\/a>, co-founder and CEO of Tidalwave, said in an interview with <strong>HousingWire<\/strong> that the outperformance is intentional because SOLO strips out personally identifiable information (PII) before processing requests. This confirms the belief that generic large language models (LLMs) use PII data sent to them, which is a violation of customer privacy in mortgages.<\/p>\n<p>Tidalwave also attributed the performance gap to differences in how the systems interpret mortgage data. General-purpose models analyze loan files as text, while SOLO is integrated with underwriting systems used by <strong><a href=\"https:\/\/www.housingwire.com\/company\/fannie-mae\/\">Fannie Mae<\/a> <\/strong>and <strong><a href=\"https:\/\/www.housingwire.com\/company\/freddie-mac\/\">Freddie Mac<\/a><\/strong> and trained on structured mortgage datasets, including Uniform Loan Application Dataset (<a href=\"https:\/\/www.housingwire.com\/articles\/hud-proposal-industry-standard-urla-form-title-i-loans\/\">URLA<\/a>) files and bank transaction records.<\/p>\n<p>Loan officers increasingly use <a href=\"https:\/\/www.housingwire.com\/articles\/blend-autopilot-mortgage\/\">AI tools<\/a> to review lengthy loan files and manage origination timelines that average more than 40 days, the company said. Lenders often lose money on each loan originated, creating pressure to automate parts of the process.<\/p>\n<p>\u201cForty-two percent on compliance questions should worry every lender relying on off-the-shelf AI right now,\u201d Yu said. \u201cWhen I was building technology at <strong>Better.com<\/strong>, I watched general-purpose tools fail on mortgage data over and over. They\u2019d miss a payroll mismatch or hallucinate a deposit source, and a human had to catch it every time.<\/p>\n<p>\u201cThat\u2019s why we built Tidalwave\u2019s SOLO differently, and that\u2019s why we tested it with Columbia University, not internally. If you\u2019re going to tell lenders your AI is accurate, you should be willing to prove it publicly.\u201d<\/p>\n<p>The benchmarking was conducted in the second half of 2025 through a collaboration between Tidalwave\u2019s engineering team and researchers at Columbia. The study evaluated 90 questions across 10 synthetic borrower scenarios, each including a full loan application file and up to two months of bank-statement transaction data.<\/p>\n<p>Yu said that the findings are part of the first iteration of the benchmark study, as the researchers will continue to test SOLO against the most updated public LLMs. <\/p>\n<p>Yu said that the 90 questions used in the benchmark were developed internally by Tidalwave\u2019s in-house mortgage experts. The team developed the questions based on common usage patterns for Tidalwave\u2019s system, including edge cases such as foreign transactions, mismatches between bank statements and loan applications, and deposits from lesser-known vendors.<\/p>\n<p>Results were measured using an F1 score, according to the technical report.<\/p>\n<p>\u201cWe partnered with Tidalwave on this benchmark to reflect the actual decision points loan officers face during origination, not abstract NLP tasks,\u201d said Zhou Yu, an associate professor at Columbia University. <\/p>\n<p>\u201cBy using realistic borrower scenarios, synthetic but structured data, and F1 scoring on both retrieval and yes\/no checks, we can see where systems truly help loan officers and where they quietly fail. We hope this becomes a template for evaluating AI in other high-stakes, regulated workflows as well.\u201d<\/p>\n<p>Tidalwave\u2019s SOLO platform is <a href=\"https:\/\/www.housingwire.com\/articles\/nexa-mortgage-partners-with-tidalwave-to-bring-agentic-ai-platform-to-brokers\/\">used by LOs<\/a> at <strong>NEXA Lending<\/strong> through <strong>Bevri.ai<\/strong>, an independent AI solutions provider. The company also integrates with\u00a0<strong>Plaid<\/strong>,\u00a0<strong>Argyle<\/strong>,\u00a0<strong>Truv<\/strong>\u00a0and\u00a0<strong>ICE Mortgage Technology<\/strong>\u00a0to automate income, employment and asset verification.<\/p>","protected":false},"excerpt":{"rendered":"<p>A joint study from mortgage technology platform Tidalwave and researchers at Columbia University found that Tidalwave\u2019s mortgage-trained AI agent produced more accurate answers to loan underwriting questions than a general-purpose large language model. The benchmarking evaluated Tidalwave\u2019s SOLO agent against Claude 4.5 from Anthropic on 90 questions commonly asked by loan officers during the mortgage&#8230;<\/p>","protected":false},"author":0,"featured_media":0,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"inline_featured_image":false},"categories":[1],"tags":[],"_links":{"self":[{"href":"https:\/\/mk.gen.tr\/tr\/wp-json\/wp\/v2\/posts\/47503"}],"collection":[{"href":"https:\/\/mk.gen.tr\/tr\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/mk.gen.tr\/tr\/wp-json\/wp\/v2\/types\/post"}],"replies":[{"embeddable":true,"href":"https:\/\/mk.gen.tr\/tr\/wp-json\/wp\/v2\/comments?post=47503"}],"version-history":[{"count":0,"href":"https:\/\/mk.gen.tr\/tr\/wp-json\/wp\/v2\/posts\/47503\/revisions"}],"wp:attachment":[{"href":"https:\/\/mk.gen.tr\/tr\/wp-json\/wp\/v2\/media?parent=47503"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/mk.gen.tr\/tr\/wp-json\/wp\/v2\/categories?post=47503"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/mk.gen.tr\/tr\/wp-json\/wp\/v2\/tags?post=47503"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}