Best practices for cloud-based machine learning platforms

Best practices for cloud-based machine learning platforms ...

Capital One pays for this segment of the VB Lab Insights series.

Most people are familiar with major technology platforms, like iOS, Windows, and AWS. Platforms, in their essence, are a set of tools that serve as a starting point for developing, experimenting, and scaling other applications. They provide many of today's most advanced technological capabilities and cutting-edge customer experiences.

Many companies are developing own sophisticated internal platforms in order to keep pace with the growing amount of data, artificial intelligence, and machine learning (ML) that big data brings. By 2025, cloud-native platforms will serve as the foundation for more than 95% of new digital initiatives, compared to less than 40% in 2021.

Enterprise technology platforms have been transformative in my experience: they enable cross-functional teams to test, launch, and learn at a rapid rate; they eliminate duplicate work and standardize capabilities; and they provide consistent and integrated experiences.

The evolution of enterprise platforms

Capital Ones' intention to be the first financial organization in the United States to switch all in on the cloud has aided in the creation and delivery of powerful, personalized customer experiences. With that robust foundation, we were able to leverage big data to extend and enhance our enterprise platforms to provide new, more meaningful customer experiences.

Much of our work in this area is already generating positive results for the business and for our customers. For example, our fraud decisioning platform was built from the ground up to make complex real-time decisions. By leveraging huge amounts of data and enabling model updates in days (compared to months), the platform helps protect millions of customers from card fraud and can be utilized by diverse stakeholders across the organization.

Ive learned a lot from my experience leading teams to deliver enterprise technology platforms.

  • Work backwards from a well-defined end state: Before you start to build, take the time to align on the end state architecture and your plan to iterate your way to that destination. Make sure your architecture is designed for self-service and contribution from the start. Better yet, design the platform assuming that you will expand it to users outside of your immediate organization or line of business. Assume that over time you will want to swap out components as technology changes.
  • Estimate how long you think it will take, then double it: It is important to take the time to brainstorm all of the capabilities that you need to build at the outset and then create a t-shirt sized level of effort for each component. Once your tech teams marry this with velocity to estimate how long it will take to build each feature, add a 50% buffer. In my experience, this estimate ends up being surprisingly accurate.
  • Focus on business outcomes: Building great platforms can take a long time. It is important to sequence the work so that business value can be achieved along the way. This motivates the team, builds credibility and creates a virtuous cycle.
  • Be radically transparent and over communicate: Share decisions, progress and roadmaps with stakeholders liberally. In addition to articulating what you are working on, also articulate what you are not currently prioritizing. Invest in documentation which enables contribution as well as easy onboarding to the platform.
  • Start small: Even the best testing and QA environment can miss issues which are not found until something is put into production. For big changes that will have meaningful customer impact, always start with a tiny population and then ramp up once you see things working in production at a small scale. When possible, use associates only for the initial population when a change impacts external customers.
  • Get serious about being well managed: Platform owners should obsess about platform performance. All issues should be self-identified through controls and automated alerts. Exceptions should be addressed quickly. Root cause analysis of issues as well as changes to prevent recurrence should be prioritized. A lack of issues should be properly celebrated so that teams know it is appreciated.
  • If it seems too good to be true Exception monitoring is a great way to ensure that your execution matches your intent. Often the goal is to have zero exceptions. For example, latency should never exceed 200 milliseconds. If your exception reporting NEVER shows any exceptions, its possible that the monitoring is broken. Always force an exception to make sure that it triggers properly. Ive learned this one the hard way.
  • A happy team is a productive team. Celebrate accomplishments, recognize team members when they go above and beyond and create a psychologically safe environment. Measure team happiness (with a quick 1-5 scale) regularly and give teams the space to discuss what would make them happier and the autonomy to try things out to squash dis-satisfiers.

The possibilities are endless when a team has a solid culture backed by the right platform technology. Businesses may better advance and experiment with newer, more innovative products and experiences when they need it the most.

Capital One's Marcie Apelt is the MVP of ML/AI Product.

You may also like: