There is a piece of research that every recruiter should have to recite before conducting their first interview. In their landmark 1998 meta-analysis, Frank Schmidt and John Hunter synthesized decades of personnel selection studies and put a number on the validity of the unstructured interview: 0.20. That is the correlation between how an interviewer feels about a candidate and how that candidate will actually perform on the job. For context, a coin flip has a correlation of roughly 0.00. The unstructured interview sits somewhere between those two extremes — closer to the coin than most hiring managers want to acknowledge.
That same analysis put the validity of structured interviews at 0.51 — more than twice as predictive. The research has been replicated, extended, and refined many times since 1998. The conclusion has not materially changed. And yet the unstructured interview — the casual 45-minute conversation where a senior engineer asks a candidate what their greatest weakness is — remains the dominant practice at most organizations.
Understanding why it persists, and why it fails in the specific ways it does, is the prerequisite to building a hiring process that actually works.
What "Unstructured" Actually Means
The term can be misleading. An unstructured interview is not necessarily a disorganized one. A hiring manager can be thoroughly prepared, warm, engaged, and deeply experienced — and still be running an unstructured interview if the questions vary between candidates, the evaluation criteria are implicit rather than scored, and there is no standardized scorecard capturing the debrief outcome.
The defining characteristics of an unstructured interview are: (1) questions are chosen in the moment or vary across candidates, (2) there is no behavioral anchor for what constitutes a "good" or "weak" response, and (3) the hiring decision emerges from a gut read rather than a scored rubric. All three conditions often hold simultaneously, and all three independently introduce the error patterns that erode validity.
The Specific Failure Modes
Cognitive science has catalogued the biases that systematically distort unstructured interview judgments. The halo effect causes an interviewer who likes a candidate's energy in the first five minutes to rate subsequent responses more generously than the evidence warrants. Affinity bias causes interviewers to evaluate candidates who share their background or communication style as more competent. First-impression effects are substantial — studies suggest that interviewers form stable impressions within the first two to four minutes, before a meaningful exchange has even occurred.
None of these are character flaws in the interviewers. They are well-documented features of human cognition that operate automatically and are essentially invisible to the person experiencing them. An interviewer who believes they are making a highly accurate judgment based on years of experience may in fact be executing a very fast, very confident version of the same low-validity process a junior recruiter would use.
The interview loop compound this problem. When five interviewers each run an unstructured conversation, their debrief calibration produces a synthesis of five independent gut reads. The highest-confidence voice in the room tends to anchor the group. Evidence that contradicts the anchor gets rationalized away. The outcome can look like consensus — but it is not convergence on evidence. It is convergence on the most persuasive person's instinct.
A Scenario That Illustrates the Gap
Consider a growing fintech company that opened an interview loop for a staff-level backend engineer in late 2024. The team was expanding quickly — nine open roles across two pods — and the hiring manager, under pressure, gave interviewers significant latitude in what to ask. The debrief format was a Slack thread where people shared impressions.
Candidate A came in with strong whiteboard performance and an assertive communication style. Candidate B was less polished in the room but had detailed, specific answers about production incidents she had handled at a prior company. In the debrief, Candidate A was described as "sharp" and "a cultural fit." Candidate B was described as "quiet" and "hard to read." Candidate A was extended an offer.
Six months later, Candidate A had not shipped a major feature and was struggling with system design at scale. The hiring manager could not point to a single document from the interview loop that would have predicted this. There was no scorecard, no anchored rubric, no record of what specific questions had been asked or how the answers compared to a defined standard. The process had produced a confident hire with essentially no evidentiary trail.
This is not an unusual story. It is the modal hiring story at most growing technology companies.
Why the Unstructured Interview Persists
The persistence of a low-validity practice in the face of decades of evidence deserves an explanation beyond "people don't read research." Several forces conspire to keep the unstructured interview in place.
First, feedback loops in hiring are extremely slow. If a company makes ten bad hires this quarter, the signal that something is wrong may not register clearly for eighteen months, by which point the hiring team has moved on, the process has shifted, and attribution is impossible. The speed mismatch between the decision and its consequences protects bad processes from being identified as such.
Second, the unstructured interview feels rigorous to the people conducting it. A seasoned hiring manager has often built a rich mental model of what "good" looks like in their domain. That model is real — but it is encoded as implicit pattern recognition, not as a reproducible scoring rubric. It can distinguish between good and great at the extremes, but in the middle of the distribution — where most hiring decisions actually live — it produces noise rather than signal.
Third, structured interviewing requires upfront work that most hiring loops are not resourced to do. Building a proper interview kit — with anchored behavioral questions, a BARS-anchored rubric for each competency, and a standardized debrief format — takes real investment. When a position needs to be filled in three weeks, that investment does not get made.
What Structured Methodology Actually Changes
We want to be precise here: structured interviewing does not eliminate bias, and it does not guarantee good outcomes. A structured interview conducted by a team that has not been trained to use the rubric will produce better-documented bad decisions, not better decisions. The mechanism matters as much as the format.
What structured methodology actually changes is the surface area for bias to operate on. When every candidate answers the same behavioral questions, and every answer is scored against the same anchored rubric before debrief, the signal-to-noise ratio of the evaluation improves substantially. The scorecard creates an evidentiary record that is independent of any single interviewer's impression. Debrief calibration can operate on scored data rather than impressions, which means the highest-confidence voice in the room has less power to anchor the group away from the evidence.
The Schmidt-Hunter validity coefficient for structured interviews (0.51) reflects this: you are not predicting job performance perfectly, but you are predicting it at a level that is meaningfully better than chance and substantially better than the unstructured alternative. In a loop with four interviewers, structured methodology also enables you to measure inter-rater reliability — whether your interviewers are actually seeing the same candidate, or whether their scores diverge in ways that indicate the rubric needs refinement.
The First Step Is Admitting the Problem Exists
The gap between 0.20 and 0.51 is not a small number when you account for the scale at which most growing organizations hire. A company making fifty engineering hires a year with a process that operates at 0.20 validity is making a lot of expensive guesses. Some of those guesses will be correct — skilled candidates can succeed even in bad processes, and some talented interviewers produce reliable judgments through informal mechanisms. But the variance is high, the process is not improving, and the documentation trail is too thin to support any meaningful audit of what went wrong when things go sideways.
The research has been telling us the same thing for twenty-five years. The unstructured interview is not a reasonable baseline to optimize from — it is the problem that structured hiring methodology exists to solve.