Most structured interview programs fail not in intent but in execution. A team decides to run behavioral interviews. They pull a question bank from a job description. They give each interviewer a rating scale from 1 to 5. The debrief produces five scores between 3 and 4, a lengthy discussion, and ultimately a decision that looks like it came from gut feel anyway. The rubric existed on paper but did no actual work.
The reason is almost always the same: the rating scale had no behavioral anchors. A "4 out of 5" in isolation means nothing. It only carries information when it corresponds to a specific, observable description of what a 4 actually looks like in response to a particular question. That is what Behaviorally Anchored Rating Scales — BARS — provide, and it is why they are the technical standard for structured interview design rather than generic numeric scales.
This article walks through how to build a BARS-anchored rubric for a software engineering role from scratch. The principles generalize to any knowledge-work position.
Start With Competencies, Not Questions
The first mistake in rubric design is leading with questions. Questions are the output of the process, not the input. You begin with competencies — the observable behaviors that actually predict success in the role — and derive questions from those.
For a mid-level software engineer, a minimal competency model might include: technical problem-solving under ambiguity, code quality and system thinking, cross-functional communication, ownership and incident response, and learning agility. Each of these is a dimension that a structured interview is trying to measure. The questions are the instruments; the competencies are what you are measuring.
A common error at this stage is conflating competencies with skills. "Knows Python" is a skill — it is either present or absent and can be screened in a technical assessment. "Learns new tooling and adapts patterns under time constraint" is a competency — it is a behavioral disposition that predicts performance across contexts and changes slowly. BARS-anchored rubrics are designed for competencies, not skills.
Writing the Behavioral Anchors
Once you have a competency, you need to define what behaviors at each rating level look like. A five-point scale works well in practice: 1 is significantly below expectation, 2 is below, 3 is meets expectation, 4 is exceeds, 5 is exceptional. The behavioral anchors are the descriptions that make those numbers real.
Take the competency "ownership and incident response." A poorly anchored rubric might describe a 5 as "demonstrates strong ownership." That is a label, not an anchor. A BARS-anchored description of a 5 for this competency would read something like: "Candidate describes an incident where they identified the root cause without being assigned to do so, communicated proactively to stakeholders outside their immediate team, implemented a fix and a monitoring addition, and ran a written post-mortem that changed process for others on the team. Actions were self-initiated and sustained across the full lifecycle of the issue."
A 2 for the same competency might read: "Candidate describes their role in incident response but primarily in reactive terms — they fixed what they were asked to fix, escalated when blocked, but did not describe proactive identification or systemic follow-through."
The anchor is a behavioral description that an interviewer can match against what a candidate actually said. It is not an abstract quality judgment. This is the mechanism by which BARS improves inter-rater reliability — multiple interviewers applying the same anchor to the same candidate response will converge more reliably than multiple interviewers applying their personal interpretation of "demonstrates strong ownership."
Connecting Anchors to Interview Questions
Each competency needs at least one primary behavioral question designed to elicit evidence for that dimension, and ideally a follow-up probe for each anchor level. The STAR format (Situation, Task, Action, Result) is a practical guide for structuring both the questions and the evaluation — behavioral anchors should be written to match the action and result components of a STAR response, not just the situation framing.
For the ownership competency, a primary question might be: "Tell me about a time you identified a problem in a system you owned — not one you were assigned to fix, but one you noticed on your own. Walk me through what you did." The follow-up probes exist to move the candidate toward the evidence you need: "What did you change in your monitoring after this?" "Who else was affected, and how did you communicate with them?" "What would you do differently if you were designing the system now?"
This is the interview kit in miniature: the question, the probes, and the anchored rubric for scoring the response. When these three components are bundled and distributed to every interviewer before the interview loop begins, you have a structured interview process. When any one is missing, validity degrades.
A Practical Build: Staff Engineer Rubric at a 60-Person Fintech
To make this concrete: imagine a growing fintech company with 60 employees hiring three staff engineers in Q1. They are running a four-stage loop: a hiring manager screen, a technical system design round, a behavioral round focused on cross-functional work, and a bar-raiser conversation. Previously, each interviewer arrived with their own question list. Debrief scores varied from 2 to 5 on the same candidates without clear rationale.
The team spent two days building a BARS rubric from scratch. They identified four competencies for the staff level: system thinking at scale, cross-functional influence, engineering quality culture, and ambiguity navigation. For each competency, they wrote behavioral anchors at levels 1, 3, and 5 — with the understanding that 2 and 4 fall between anchors and can be scored by interpolation. Each interviewer was assigned one or two competencies based on their round, with dedicated questions and written anchor cards distributed through their ATS integration before each loop.
The outcome of the first three loops run under this format: debrief calibration time dropped significantly because interviewers came in with scored cards rather than impressions. The divergence between high and low scores was immediately visible and the team could point to specific response evidence for the disagreement, rather than arguing about abstract "culture fit." One candidate who would likely have received an offer under the old process was declined because their behavioral responses scored consistently at 2 on the cross-functional influence dimension — a pattern that only became visible when all three behavioral round interviewers had scored the same competency against the same anchors.
What BARS Does Not Solve
We want to be direct about the limits. BARS-anchored rubrics produce more reliable and more auditable evaluations. They do not produce unbiased ones. If your question bank systematically favors candidates with high-resource professional backgrounds — people who have worked at well-resourced companies with formal post-mortems and sophisticated monitoring stacks — then a 5-anchor that requires that background will screen for that experience rather than the underlying competency.
Rubric quality depends on the caliber of the behavioral anchors, which in turn depends on the quality of the job analysis that produced the competencies. A competency model that over-indexes on how your most visible star performer behaves will embed that person's specific background into the selection instrument. Building good anchors requires deliberate effort to describe the behavior in terms of what the person did and how they thought, not what context they had access to.
BARS is also not a replacement for interviewer training. An interviewer who does not know how to probe for behavioral evidence — who accepts vague "we" statements without asking "what specifically did you do," who lets candidates narrate future-state hypotheticals instead of specific past events — will produce low-quality STAR responses that cannot be scored accurately regardless of how well-anchored the rubric is.
Maintaining the Rubric Over Time
A BARS rubric is not a static document. Role requirements shift as teams grow. Anchors that were calibrated when the company was 30 people may not capture what "system thinking at scale" means when the company has 300. The rubric should be reviewed at minimum every six months by a small calibration group — ideally including recent hires who can speak to whether the anchors matched what the job actually required in their first year.
Inter-rater reliability metrics are your ongoing signal that the rubric is working. If two interviewers scoring the same competency in the same round consistently diverge by more than one rating point, the anchor language needs sharpening. The Cohen's kappa coefficient is the standard measure here: a kappa above 0.6 indicates good reliability; below 0.4 suggests the anchors are too ambiguous to produce consistent scoring. Tracking this across loops is what turns a rubric from a one-time document into a continuously improving measurement instrument.
The investment required to build a proper BARS rubric for a single role is real — typically eight to twelve hours for a team doing it carefully for the first time. But that investment compounds. Rubrics can be adapted across similar roles. Anchors built for a staff engineer loop can be inherited and adjusted for a senior engineer loop. Over time, a library of calibrated anchors becomes one of the most durable assets in a hiring organization's toolkit.