AI

4 ways to show customers they can trust your generative AI enterprise tool

Comment

4 antique keys on a white background
Image Credits: umdash9 (opens in a new window) / Getty Images

Luigi La Corte

Contributor

Luigi La Corte is co-founder and CEO at Provision.

At the dawn of the cloud revolution, which saw enterprises move their data from on premise to the cloud, Amazon, Google and Microsoft succeeded at least in part because of their attention to security as a fundamental concern. No large-scale customers would even consider working with a cloud company that wasn’t SOC2 certified.

Today, another generational transformation is taking place, with 65% of workers already saying they use AI on a daily basis. Large language models (LLMs) such as ChatGPT will likely upend business in the same way cloud computing and SaaS subscription models did once before.

Yet again, with this nascent technology comes well-earned skepticism. LLMs risk “hallucinating” fabricated information, sharing real information incorrectly, and retaining sensitive company information fed to it by uninformed employees.

Any industry that LLM touches will require an enormous level of trust between aspiring service providers and their B2B clients, who are ultimately those bearing the risk of poor performance. They’ll want to peer into your reputation, data integrity, security, and certifications. Providers that take active steps to reduce the potential for LLM “randomness” and build the most trust will be outsized winners.

For now, there are no regulating bodies that can give you a “trustworthy” stamp of approval to show off to potential clients. However, here are ways your generative AI organization can build as an open book and thus build trust with potential customers.

Seek certifications where you can and support regulations

Although there are currently no specific certifications around data security in generative AI, it will only help your credibility to obtain as many adjacent certifications as possible, like SOC2 compliance, the ISO/IEC 27001 standard, and GDPR (General Data Protection Regulation) certification.

You also want to be up-to-date on any data privacy regulations, which differ regionally. For example, when Meta recently released its Twitter competitor Threads, it was barred from launching in the EU due to concerns over the legality of its data tracking and profiling practices.

As you’re forging a brand-new path in an emerging niche, you may also be in a position to help form regulations. Unlike Big Tech advancements of the past, organizations like the FTC are moving far more quickly to investigate the safety of generative AI platforms.

While you may not be shaking hands with global heads of state like Sam Altman, consider reaching out to local politicians and committee members to offer your expertise and collaboration. By demonstrating your willingness to create guardrails, you’re indicating you only want the best for those you intend to serve.

Set your own safety benchmarks and publish your journey

In the absence of official regulations, you should be setting your own benchmarks for safety. Create a roadmap with milestones that you consider proof of trustworthiness. This may include things like setting up a quality assurance framework, achieving a certain level of encryption, or running a number of tests.

As you achieve these milestones, share them! Draw potential customers’ attention to these attempts at self-regulation through white papers and articles. By showing that safety achievements are front of mind, you’re establishing your own credibility.

You’ll also want to be open about which LLMs or APIs you’re using, as this will enable others to get a fuller understanding of how your technology functions and establishes greater trust.

When possible, open source your testing plan/results. Provide highly detailed test cases, with a simple framework composed of questions, answers, and ratings for each against a benchmark.

Open sourcing parts of your process will only build trust with your user base, and they’ll likely ask to see examples during procurement.

Back up the data integrity of your product

Liability is a complicated issue. Let’s take the example of risk in the construction industry. Construction firms can outsource risk management to lawyers — which enables the company to hold that third party accountable if something goes wrong.

But if you, as a new provider, offer AI tools that can replace a legal advisor for a 10x–100x lower price, the likely trade-off is that you’ll absorb far less liability. So the next best thing you can offer is integrity.

We think that integrity will look like an auditable quality assurance process that potential customers can peer into. Users should know which outputs are currently “in distribution” (i.e., which outputs your product can provide reliably), and which aren’t. They should also be able to audit the output from tests in order to build confidence in your product. Enabling prospective customers to do so puts you ahead of the curve.

Along those lines, AI providers will need to start explaining data integrity as a new “leave-behind” pillar. In traditional B2B SaaS, businesses address common questions such as “security” or “pricing” with leave-behind materials like digital pamphlets.

Providers will now have to start doing the same with data integrity, diving into why and how they can promise “no hallucination,” “no bias,” edge case tested, and so on. They will always need to backstop these claims with quality assurance.

(As an aside, we’ll likely also see underwriters creating policies for agents’ errors and omissions, once they proliferate.)

Stress test your product until your error rate is acceptable

It may be impossible to guarantee that a platform never makes mistakes when it comes to LLMs, but you’ve got to do whatever it takes to bring your error rate down as low as possible. Vertical AI solutions will benefit from tighter, more focused feedback loops, ideally using a steady stream of preliminary usage data, that will propel them to decrease error rate over time.

In some industries, the margin for error may be more flexible than others — think caricature generators versus code generators.

But the honest answer is that the error rate the client accepts (with eyes wide open) is a good one. For certain cases, you want to reduce false negatives, in others, false positives. Error will need to be scrutinized more closely than with a single number (e.g., “99% accurate”). If I were a buyer, I would instead ask:

  • “What’s your F1 score?”
  • “When designing, what type of error did you index on? Why?”
  • “In a balanced dataset, what would your error rate be for labeling data?”

These questions will really uncover the seriousness of a provider’s iteration process.

An absence of regulation and guidelines does not mean that customers are naive when examining your level of risk as an AI provider. A prudent customer will demand that any company prove that their product can perform within an acceptable error rate, and show respect for robust safeguards. The ones that don’t will surely lose.

More TechCrunch

Sona, a workforce management platform for frontline employees, has raised $27.5 million in a Series A round of funding. More than two-thirds of the U.S. workforce are reportedly in frontline…

Sona, a frontline workforce management platform, raises $27.5M with eyes on US expansion

Uber Technologies announced Tuesday that it will buy the Taiwan unit of Delivery Hero’s Foodpanda for $950 million in cash. The deal is part of Uber Eats’ strategy to expand…

Uber to acquire Foodpanda’s Taiwan unit from Delivery Hero for $950M in cash 

Paris-based Blisce has become the latest VC firm to launch a fund dedicated to climate tech. It plans to raise as much as €150M (about $162M).

Paris-based VC firm Blisce launches climate tech fund with a target of $160M

Maad, a B2B e-commerce startup based in Senegal, has secured $3.2 million debt-equity funding to bolster its growth in the western Africa country and to explore fresh opportunities in the…

Maad raises $3.2M seed amid B2B e-commerce sector turbulence in Africa

The fresh funds were raised from two investors who transferred the capital into a special purpose vehicle, a legal entity associated with the OpenAI Startup Fund.

OpenAI Startup Fund raises additional $5M

Accel has invested in more than 200 startups in the region to date, making it one of the more prolific VCs in this market.

Accel has a fresh $650M to back European early-stage startups

Kyle Vogt, the former founder and CEO of self-driving car company Cruise, has a new VC-backed robotics startup focused on household chores. Vogt announced Monday that the new startup, called…

Cruise founder Kyle Vogt is back with a robot startup

When Keith Rabois announced he was leaving Founders Fund to return to Khosla Ventures in January, it came as a shock to many in the venture capital ecosystem — and…

From Miles Grimshaw to Eva Ho, venture capitalists continue to play musical chairs

On the heels of OpenAI announcing the latest iteration of its GPT large language model, its biggest rival in generative AI in the U.S. announced an expansion of its own.…

Anthropic is expanding to Europe and raising more money

If you’re looking for a Starliner mission recap, you’ll have to wait a little longer, because the mission has officially been delayed.

TechCrunch Space: You rock(et) my world, moms

Apple devoted a full event to iPad last Tuesday, roughly a month out from WWDC. From the invite artwork to the polarizing ad spot, Apple was clear — the event…

Apple iPad Pro M4 vs. iPad Air M2: Reviewing which is right for most

Terri Burns, a former partner at GV, is venturing into a new chapter of her career by launching her own venture firm called Type Capital. 

GV’s youngest partner has launched her own firm

The decision to go monochrome was probably a smart one, considering the candy-colored alternatives that seem to want to dazzle and comfort you.

ChatGPT’s new face is a black hole

Apple and Google announced on Monday that iPhone and Android users will start seeing alerts when it’s possible that an unknown Bluetooth device is being used to track them. The…

Apple and Google agree on standard to alert people when unknown Bluetooth devices may be tracking them

The company is describing the event as “a chance to demo some ChatGPT and GPT-4 updates.”

OpenAI’s ChatGPT announcement: Watch here

A human safety operator will be behind the wheel during this phase of testing, according to the company.

GM’s Cruise ramps up robotaxi testing in Phoenix

OpenAI announced a new flagship generative AI model on Monday that they call GPT-4o — the “o” stands for “omni,” referring to the model’s ability to handle text, speech, and…

OpenAI debuts GPT-4o ‘omni’ model now powering ChatGPT

Featured Article

The women in AI making a difference

As a part of a multi-part series, TechCrunch is highlighting women innovators — from academics to policymakers —in the field of AI.

18 hours ago
The women in AI making a difference

The expansion of Polar Semiconductor’s facility would enable the company to double its U.S. production capacity of sensor and power chips within two years.

White House proposes up to $120M to help fund Polar Semiconductor’s chip facility expansion

In 2021, Google kicked off work on Project Starline, a corporate-focused teleconferencing platform that uses 3D imaging, cameras and a custom-designed screen to let people converse with someone as if…

Google’s 3D video conferencing platform, Project Starline, is coming in 2025 with help from HP

Over the weekend, Instagram announced that it is expanding its creator marketplace to 10 new countries — this marketplace connects brands with creators to foster collaboration. The new regions include…

Instagram expands its creator marketplace to 10 new countries

You can expect plenty of AI, but probably not a lot of hardware.

Google I/O 2024: What to expect

The keynote kicks off at 10 a.m. PT on Tuesday and will offer glimpses into the latest versions of Android, Wear OS and Android TV.

Google I/O 2024: How to watch

Four-year-old Mexican BNPL startup Aplazo facilitates fractionated payments to offline and online merchants even when the buyer doesn’t have a credit card.

Aplazo is using buy now, pay later as a stepping stone to financial ubiquity in Mexico

We received countless submissions to speak at this year’s Disrupt 2024. After carefully sifting through all the applications, we’ve narrowed it down to 19 session finalists. Now we need your…

Vote for your Disrupt 2024 Audience Choice favs

Co-founder and CEO Bowie Cheung, who previously worked at Uber Eats, said the company now has 200 customers.

Healthy growth helps B2B food e-commerce startup Pepper nab $30 million led by ICONIQ Growth

Booking.com has been designated a gatekeeper under the EU’s DMA, meaning the firm will be regulated under the bloc’s market fairness framework.

Booking.com latest to fall under EU market power rules

Featured Article

‘Got that boomer!’: How cybercriminals steal one-time passcodes for SIM swap attacks and raiding bank accounts

Estate is an invite-only website that has helped hundreds of attackers make thousands of phone calls aimed at stealing account passcodes, according to its leaked database.

22 hours ago
‘Got that boomer!’: How cybercriminals steal one-time passcodes for SIM swap attacks and raiding bank accounts

Squarespace is being taken private in an all-cash deal that values the company on an equity basis at $6.6 billion.

Permira is taking Squarespace private in a $6.9 billion deal

AI-powered tools like OpenAI’s Whisper have enabled many apps to make transcription an integral part of their feature set for personal note-taking, and the space has quickly flourished as a…

Buy Me a Coffee’s founder has built an AI-powered voice note app