Equifax, Facebook and Dangers of Centralization [of Data]

Equifax. It’s hard to sit on sit my hands and not write on this one. My perspective is shaped through running 2 of the largest online banks in the world, developing state of the art fraud prevention systems with the top 20 banks, working with Google and today creating Commerce Signals.

Enron has new competition for the company name that denotes loss and fraud. Equifax may be the single largest breach of consumer information in history…. It is everything from social to DOBs, DL #s, …. How did Equifax get our data? Banks, collection agencies, utilities and other billers gave it to them (see FDIC paper). By collecting all of your account balances, when you opened accounts, your payment status with the accounts you have, …etc. Equifax could develop a “complete picture” of your financial and non-financial activity. For example if you opened 5 credit card accounts last week.. it could be a signal that you are in an atypical situation (ie credit risk).

I’m not going to take apart what happened, but rather why we centralize data what MUST BE DONE to fix it.. particularly with banks.

Centralizing Data – 600 Yrs

Big data, and the analytics that power it, is a great business. Players like Equifax enjoy a virtuous cycle: more data leads to better products, better products to more use, more use to more data. Equifax stock has grown 4x over the last 3 yrs because of this dynamic. Data driven decisions (ie credit score) are much better than an intuition or a hunch. Data is transforming the way companies find new opportunities, better match consumer demand and improve efficiency.

Since the printing press, (~600 years) we have lived in a world of “information centralization” (ie libraries, professional societies and Big Data). Centralization enabled scarce information (ex books) to be discoverable without relationships. Broad discovery accelerated the creation of new ideas/information and new centers of specialists. These “hubs” of information became key societal institutions (schools, monasteries, government, companies) which subsequently expanded the number of specialties and creation of new content/ideas. Communication between specialists created networks across hubs, further advancing the both the art and broadening the availability of specialists geographically.

Obviously, the internet provided scale-free distribution of [public] information thus allowing decentralized specialist networks to interact AND ubiquitous (non-specialist) access to information. In a scale free network, the ability to access information is constrained by discovery, but the ability to ACT on information remains the same as that faced within verbal communication. (My next blog is about Discovery)

networksAnthropologically we have different levels of trust with each interaction. The actionability of verbal communication not only depended on trust, but a common understanding of both the language and the environment (ie common reference point) by which the information was conveyed. The ability to control our personal information is key to our role in society and our environment.

When observers can combine their views and uncontrolled fashion (ie aggregation), uncontrolled insight can be gained. Not just insight into past behaviors, but predictive insight on future behaviors (see article). Zola’s Project Insight Algorithm in Captain America capture a nefarious future use https://www.youtube.com/watch?v=qGpz8Q4Jq6A

Most businesses are willing participants in the opaque exchange of data. Unfortunately it is generally accepted that the only way to make data decisions is to have all of your data lumped together in one place where it can be actioned. Certainly, having all of the data co-located allows for lots of great free-form analysis… after all you don’t know what you might uncover.. but that IS THE PROBLEM: once data leaves your facilities you have lost control.

  • How it can be combined with other data ?
  • What new insights are revealed
  • How will it be used?
  • Who will use it?

Economically, the centralization of behavior and the destruction of anonymity impact both supply and demand thus disrupting both pricing mechanisms and the function of commercial networks (the advantages of markets vs direct transactions). Economists have demonstrated how imperfect information flow shapes markets and margins. For example, markets have played the central role in economic discovery (goods and price). If I can predict demand and influence your behavior before you are “in a market” I have destroyed “the market” and created a new orchestrator of demand (see Transformation of Commercial Networks).

As I outlined in Small Wins, my hope is that we are in “Big Data 1.0” world that is evolving to a more thoughtful treatment of consumer information. While initial treatment of private data with the same tools and techniques developed for public data is a logical first step, we must seek to change the process. Although centralization of information provides enormous advantages to scale (governments, vendors, …etc). Federated data and federated discovery provides advantages to consumers, privacy and small businesses.

Federated DataTM

Centralizing data is a version 1.0 approach to data analytics, optimized for technology (not privacy or economics). Centralizing data leads to centralized intelligence (hence the NSA) and a very very virtuous cycle. Alternative, a federated model allows for data to remain with the owner and “questions are answered”. Federated data allows for a chain of command on what data flowed to who for what purpose. This creates trust, transparency and control… none of which are descriptors for today’s data market.EU Seperation 2

Lets take a fraud example. Banks could fight the fraud in account opening by consolidating all consumer data in one single Equifax bureau, where each bank is charged to “ask a question” of the centralized entity. They could also construct a common network, with consistent communication and standards. Each bank member would be required to respond to a question (individually).

Question: Did xxx open an account in last 5 days?

Answer: No

If someone stole either side of this transaction who would care? Using anonymized one ID (tokens) and encrypting the transmission improves security, trust and makes the question non-repudable.

Why don’t we do it this way today?

  • A: 10 yrs ago we didn’t have the network speed we have today and it was just easier to send over a file (the bank world lives in batch).
  • B: There is no economic incentive for incumbents to enable fine grained control

The federated model more closely resembles nature, where intelligence is localized. Humans communicate based upon our level of trust and the objective of conversation. This ability to throttle information allows the owner of information to control its flow, use, granularity, and destination. For more here see my blog “The Day Big Data Died”.

The founder of the internet (Tim Berners-Lee) is seeking to reclaim it from Google and Facebook (see this excellent article in Digital Trends). Tim’s new project, underway at his MIT lab, is called Solid (“social linked data”), a way for you to own your own data while making it available to the applications that you want to be able to use it. This federated data design (Tim’s SOLID/Data Pods) are also core to what we have built at Commerce Signals. (see Small Wins)

Bank Actions

Beyond the bureaus, most banks have NO IDEA of how their data is being used. Banks give out raw data to the bureaus, and their marketing teams have given “anonymized” data to many other entities (ie Argus, Affinity, Cardlytics, …etc). I can assure you that for every effort banks make to anonymize their data, these other parties work to de-anonymize (example Argus looks at recurring bill payments and other things). This de-anonymized information is then labeled the property of the aggregator and then individual consumer behavior is sold (ex Argus and Nielsen/NBI and Exelate). See the Google/Epic complaint, that’s right.. bank data is flowing here without the knowledge (or permission) of banks or their consumers.

As I’ve stated the role of banks is an intermediary to commerce. Banks MUST PLAY a role in data, both in support of their core business and in support of retailers and consumers. Data also plays an outsize role in the future of bank margin. However, payment data in particular is LEAKING and the risk to everyone is far greater than Argus.

Transaction data is located with many different entities (shown below), each with a different granularity (ie SKU detail), breadth (ex all merchant/banks), consumer permission, regulatory regime, timeliness, …etc. For example, Issuers hold consumer information, while Networks are “meta directory of commerce” (all banks all merchants), and processors have all non-cash payments, location data and POS transaction identifier.

In one example, Epson has started a new data business based information sent TO small printers within the merchants POS. This includes customer NAME and last 4 of consumer card.

Banks Actions

#1 INVENTORY: You MUST know where your data flows TODAY. What data flows under which agreement? The bank with the best data privacy approach today is JPMC. It doesn’t hurt that the Jamie’s head of Data (Len Laufler) was the founder of Argus and knows exactly what is going on (hence JPMC is only bank to pull out of Argus).

#2 Warning Shots/ Audit of Use/ Indemnification. Send a note to all of your vendors telling them how seriously you are taking this. Tell them you want to have a regular reports of what data is being used for what purpose (and by whom). BTW.. this is what we do at my company.

#3 Collaboration. Collective action = collective benefit (but NOT collective data – aggregation). Data is core to bank margin and profitability. Look at how you collaborate with key parties in banking. EU driven initiatives like SEPA and PSD2 are distractions meant to commoditize financial services. Don’t participate in anything without a sound business driver.. I’ve never met a regulatory driven initiative that helped profitability.. The Equifax breach has exposed many new threat vectors. USBanks have a secret weapon in the fight against data breach: Early Warning Services (bank owned). I’m not going to educate fraudsters in what this team does.. but they have proven their ability to help banks manage fraud from sharing insight.

#4 Partner Risk Assessment. Assess the risks with your key partners. Who has your data? What are the contracts? Where is your data being used without your consent? What are your data security standards and how often do you audit compliance?

#5 Define your model of collaboration (end state). Exchanging data is NOT complex.. most 6 yr olds can do it. Is there another way to manage risk beyond consolidation of data? Don’t get caught up in technology, that is what led you here. No one bank can do this on its own. To collaborate you must find an entity to act within (avoid collusion, emphasize fraud management).

#6 Stop data flow where there is no value. Do you really want to give out all of your consumer level transaction information to a rewards or market analytics vendor? Ask Cardlytics what happened last month to Bank of America and Citibank data…

#7 Tokenization. Stop use of consumer SSN. It is no longer unique or valuable. Create new consumer tokens and methods of identifying consumer. Integrate with mobile.

#8 Migrate high risk data flows to a new model above.

I would be glad to talk to any bank about these items..

Leave a Reply

Your email address will not be published. Required fields are marked *