intervue.io Java Backend Interview

      How to read this page. This is the full bank of questions from the intervue.io question bank — asked of a candidate interviewing for a Java application role — deduplicated and grouped by skill. Each question appears as a Q ·  block followed by an answer you can deliver in 60–120 seconds. The recurring theme the panel kept probing: can you go past the textbook definition into trade-offs, failure modes, and a concrete implementation? That's exactly where the answers below aim.
    

01 · Caching Frameworks

Caching strategies & Redis

This was the "must-have" skill on every scorecard. The interviewers rarely wanted a definition — they wanted you to pick a strategy and defend the trade-off: durability vs throughput, consistency vs latency, and what happens the moment the cache lets you down at 3 a.m.

You're designing a payment system where transaction updates must never be lost. High write volume, cache + database. You need durability, cache–DB consistency, and high write throughput. Would you use Write-Through or Write-Behind caching, and how do you decide?

Start by naming the one constraint that dominates: "transaction updates must never be lost." That phrase alone settles the debate before I weigh throughput.

Write-Through — the write goes to the cache and the database synchronously, and only then do I ack the client. Durability is guaranteed because the DB has the record before the user ever sees "success." The cost is write latency: every write pays for the slower of the two stores.
Write-Behind (write-back) — the write lands in the cache, I ack immediately, and a background flush persists to the DB later. Throughput is excellent and latency is tiny, but there's a window where data lives only in the cache. If that node dies before the flush, the write is gone.

Decision: for a payment system, Write-Through. "Never lost" is non-negotiable, and a payment write at human speed (clicking Pay) doesn't need sub-millisecond latency — a few extra milliseconds to hit the DB is a fine price for not losing money.

The nuance that scores points: I wouldn't actually treat the cache as the durability layer at all. The real durability pattern for payments is write to the database first (the system of record), then update or invalidate the cache — and for the high-throughput requirement I'd batch the reads through cache-aside, not force every write through the cache. If someone insists on write-behind for throughput, the only safe version uses a durable write-ahead buffer (e.g. an append-only log or Kafka) so the "behind" part survives a crash. That's the honest comparison the scorecards said candidates skipped: write-behind isn't wrong, it's just only safe when the buffer itself is durable.

A very popular product page caches for 60 minutes. At the exact second it expires, 10,000 concurrent users hit the page. The cache is empty, so all 10,000 requests hit the database at once and crash it. How do you design the caching layer to prevent this?

This is the classic cache stampede (a.k.a. thundering herd / dog-piling). The fix is layered — I'd mention all four because each closes a different gap:

1. Request coalescing / mutex lock. When the key is missing, only the first request acquires a short lock (SET key LOCK NX PX 5000 in Redis) and goes to the DB; the other 9,999 either wait briefly and re-read, or are served the stale value. One DB hit instead of 10,000.
2. Stale-while-revalidate. Don't hard-expire. Keep serving the old value past TTL while a single background job refreshes it. Users never see an empty cache; the DB sees exactly one refresh query.
3. Randomized / jittered TTL. Instead of every entry expiring at the same 60-minute mark, set TTL = 3600 + random(0, 300). This spreads expirations so a whole population of keys never expires in the same second.
4. Background pre-warming. For known-hot keys (the famous product page), a scheduled job refreshes the entry before it expires, so it's never actually cold.

So-what: the lock handles the single-key spike, jitter handles the many-keys-at-once case, and stale-while-revalidate means the user never pays for the miss.

During a big sale, the application suddenly receives heavy traffic. The database CPU reaches 100% because a large set of cache entries expired simultaneously. How would you solve this?

This is the sibling problem — cache avalanche. Stampede is many requests for one expired key; avalanche is many different keys all expiring together and collectively burying the DB.

The primary fix is TTL jitter — never let a batch of keys share an exact expiry. Add the same coalescing lock per key so each key only triggers one rebuild. Beyond that, the avalanche-specific defenses:

Tiered TTLs: a short logical TTL plus a longer physical TTL, so even "expired" data is still physically present to serve stale during the rebuild.
A circuit breaker / rate limiter in front of the DB so that even if the cache fully misses, the DB receives a bounded query rate and sheds the rest (serve stale or a graceful "try again") instead of toppling.
Replicate the cache (Redis cluster + replicas) so a single node failing doesn't dump all its keyspace onto the DB at once.

Your e-commerce app uses Redis to cache product and session data. How do you handle it when the cache is destroyed (Redis goes down or restarts cold)?

Two failure modes, two answers.

Product data is regenerable — it lives in the DB. A cold cache means a burst of misses, so I protect the DB exactly as above: request coalescing, a rate limiter / circuit breaker in front of the DB, and gradual cache warming (re-populate top SKUs on startup rather than letting organic traffic stampede). The app should treat a Redis outage as "degraded but alive" — fall back to the DB, don't 500.

Session data is the dangerous one — if it's only in Redis and Redis is wiped, every user is logged out. Mitigations: enable Redis persistence (AOF + RDB) so a restart reloads state, run Redis in a replicated cluster with Sentinel/failover so a single node loss doesn't lose the keyspace, and for true resilience make sessions stateless via signed JWTs so they don't depend on a server-side store at all. The headline: the cache must be a performance layer, never the only copy of anything that matters.

How does the @Cacheable annotation work internally in Spring Boot?

@Cacheable is AOP under the hood — it isn't magic, it's a proxy. At startup Spring wraps the bean in a proxy; when you call the annotated method, the proxy intercepts the call before your code runs:

It computes a cache key from the method arguments (via a KeyGenerator, or a SpEL key you specify).
It asks the configured CacheManager (e.g. RedisCacheManager, Caffeine) for that key. Cache hit → it returns the cached value and your method body never executes. Cache miss → it invokes the real method, stores the return value under the key, then returns it.

The two gotchas worth naming (the scorecard flagged this): because it's proxy-based, self-invocation (calling an @Cacheable method from another method in the same class via this.) bypasses the proxy and the cache does nothing — same trap as @Transactional. And @CachePut always executes and updates the cache, while @CacheEvict removes entries — those three are the trio for read/write/invalidate.

What are caching strategies like LRU and LFU? When two entries have the same frequency and one must be evicted, which one goes?

These are eviction policies — they decide what to drop when the cache is full.

LRU (Least Recently Used) — evict the entry that hasn't been accessed for the longest time. Bets that "recently used → likely used again soon." Implemented with a HashMap + doubly-linked list for O(1).
LFU (Least Frequently Used) — evict the entry with the lowest access count. Bets that "popular over time → stays popular."

The tie-break the interviewer was fishing for: when two LFU entries have the same frequency, you fall back to LRU among them — evict the one that was accessed least recently. That's exactly how a proper LFU (e.g. LeetCode 460) is specified, and it's why a real LFU keeps a recency ordering within each frequency bucket. Practical note: LRU is the default for most caches (Redis offers both via maxmemory-policy) because it's cheaper and resists "cache pollution" from a one-time burst that would inflate LFU counts.

Why choose Redis over other caching systems?

Three reasons I'd give, in order of weight:

Rich data structures. Memcached stores opaque strings; Redis has lists, hashes, sets, sorted sets, bitmaps, streams, HyperLogLog. A sorted set gives you a leaderboard or a rate-limiter or a delay queue for free — no read-modify-write round trips.
Atomic operations + single-threaded command loop. Because Redis executes commands one at a time, operations like INCR, SETNX, and Lua scripts are atomic without you taking a distributed lock. That's why it's the go-to for counters, locks, and idempotency keys.
Persistence & replication. Unlike a pure in-memory cache, Redis can persist (RDB snapshots / AOF) and replicate with failover, so it doubles as a durable-ish store when you need it.

I'd be honest about the flip side: Memcached's multi-threaded model can edge out Redis for dead-simple, massive-scale string GET/SET, and it's leaner on memory per key. The choice is "do I need structures and atomicity (Redis) or just a big fast blob store (Memcached)?"

02 · Spring Boot

Spring Boot & core Spring

The Spring questions clustered around two things the proxy model makes counter-intuitive: transactions and AOP. Almost every candidate hit the self-invocation trap, so it's worth being airtight here.

A DB operation is supposed to roll back automatically when an exception occurs. But in production, even after an exception is thrown, the transaction still commits. How would you debug why the rollback isn't happening?

I'd walk a checklist of the usual culprits, most-likely first:

1. Checked vs unchecked exception. This is the #1 cause. By default Spring only rolls back on RuntimeException and Error — not on checked exceptions. If the code throws a checked Exception (e.g. IOException), the transaction commits. Fix: @Transactional(rollbackFor = Exception.class).
2. The exception is swallowed. A try/catch inside the method catches it and doesn't rethrow — Spring never sees an exception, so it commits. Either rethrow or call setRollbackOnly().
3. Self-invocation. The @Transactional method is called from another method in the same class via this.method(), bypassing the proxy — so no transaction exists to roll back.
4. Method not public. Spring's proxy-based @Transactional only applies to public methods; on a private/protected method it's silently ignored.
5. Wrong setup. The DB engine doesn't support transactions (e.g. MyISAM in MySQL), or the call runs on a different thread/connection than the transactional one.

How I'd actually debug it: turn on logging.level.org.springframework.transaction=DEBUG to see whether a transaction is even being created and where the commit/rollback boundary is. Nine times out of ten the log shows "no transaction" (self-invocation / non-public) or "committing" after a checked exception.

You have a @Service class. Method A is not annotated; Method B has @Transactional. If Method A calls Method B internally via this.methodB(), will the transaction start?

No. The transaction will not start. Spring's @Transactional works through a proxy that wraps the bean. The transactional behavior is applied when a call comes through that proxy from outside. When A calls this.methodB(), the call goes directly to the target object — the proxy is never involved — so the transactional advice never fires.

Fixes I'd offer:

Move B to a separate bean and inject it — now the call crosses a proxy boundary. (Cleanest.)
Self-inject the proxy (@Autowired private MyService self;) and call self.methodB().
Use AopContext.currentProxy() (requires exposeProxy = true) — works but ugly.

The same proxy rule explains why @Cacheable and @Async also silently do nothing on self-invocation.

This code has architectural flaws around bean scopes and self-invocation. Fix it: a @Scope("prototype") DataProcessor is @Autowired into a singleton ReportService (and should be new every call), and a completeTask() method calls this.updateDatabase() which is @Transactional.

Two distinct bugs, both rooted in how Spring wires beans.

Flaw 1 — prototype injected into a singleton. ReportService is a singleton, created once. The @Autowired DataProcessor is injected once at startup, so even though DataProcessor is prototype-scoped, generateReport() reuses the same instance forever — defeating "new every time." Fix: don't inject the instance, inject a factory:

@Autowired
private ObjectProvider<DataProcessor> processorProvider;

public void generateReport() {
    DataProcessor processor = processorProvider.getObject(); // fresh prototype each call
    processor.process();
}

(ObjectProvider / @Lookup method injection / a Provider<T> all work; ObjectProvider is the modern idiom.)

Flaw 2 — self-invocation on @Transactional. completeTask() calls this.updateDatabase(), so the proxy is bypassed and the DB update is not transactional. Fix: move updateDatabase() to a separate bean and call it through the injected dependency, or self-inject the proxy and call self.updateDatabase().

The third "flaw" the prompt hints at: a stateful prototype bean (id = UUID) being treated as if it resets per call — which is really the same scope-mismatch lesson. Net: never hold prototype state behind a singleton without a provider.

What is AOP and how do you use it in Spring Boot? Explain Join Point, Advice, and Pointcut, and the types of advice.

AOP (Aspect-Oriented Programming) lets you pull out cross-cutting concerns — logging, security, transactions, metrics — that would otherwise be copy-pasted across every method, and centralize them in one place called an aspect. Spring implements it with runtime proxies.

The vocabulary, with a mental model:

Join Point — a point in execution where an aspect could be applied. In Spring, that's always a method execution.
Pointcut — an expression that selects which join points to advise, e.g. execution(* com.app.service..*(..)). It's the "where."
Advice — the code that runs at a matched join point. It's the "what."
Aspect — the class (@Aspect) bundling pointcuts + advice together.

The five advice types:

@Before — runs before the method.
@AfterReturning — runs after it returns successfully (can read the return value).
@AfterThrowing — runs if it throws.
@After — runs always (finally-style).
@Around — wraps the method; gets a ProceedingJoinPoint and decides whether/when to call proceed(). This is the most powerful — it's how you time a method or short-circuit it.

Implement logging + performance metrics for all service methods except one specific package — and log execution time of every service method without modifying the service classes.

This is the canonical "use @Around advice" answer — and the "without modifying the service classes" phrase is the giveaway that AOP is the intended tool. One aspect handles both:

@Aspect
@Component
public class MetricsAspect {

  // match all service methods EXCEPT the excluded package
  @Around("execution(* com.app.service..*(..)) " +
          "&& !execution(* com.app.service.internal..*(..))")
  public Object logAndTime(ProceedingJoinPoint pjp) throws Throwable {
    long start = System.nanoTime();
    try {
      Object result = pjp.proceed();          // run the real method
      return result;
    } finally {
      long ms = (System.nanoTime() - start) / 1_000_000;
      log.info("{} took {} ms", pjp.getSignature(), ms);
    }
  }
}

The exclusion is done right in the pointcut with && !execution(... .internal..*(..)). Because it's a proxy aspect, none of the service classes are touched — exactly the requirement. For real metrics I'd push the timing to Micrometer (@Timed or a Timer) so it lands in Prometheus/Grafana instead of just logs.

What are auto-configurations in Spring Boot?

Auto-configuration is the mechanism that makes Spring Boot "just work" without XML. At startup, @EnableAutoConfiguration (folded into @SpringBootApplication) scans the classpath and conditionally registers beans based on what it finds.

The engine is conditional annotations: @ConditionalOnClass (is the library on the classpath?), @ConditionalOnMissingBean (did the developer not already define one?), @ConditionalOnProperty, etc. So if it sees spring-boot-starter-data-jpa + an H2 driver and you haven't defined a DataSource, it wires a sensible one for you. The candidate gets credit for naming the key principle: your beans always win — auto-config backs off (@ConditionalOnMissingBean) the moment you define your own. The list of auto-config classes lives in META-INF/spring/...AutoConfiguration.imports (formerly spring.factories), and you can debug what fired with --debug (the "conditions evaluation report").

How many types of dependency injection exist in Spring?

Three:

Constructor injection — dependencies passed to the constructor. Preferred. It makes dependencies mandatory and final (immutable), prevents the object from existing in a half-built state, and makes the class trivially unit-testable. Spring injects it without even needing @Autowired if there's a single constructor.
Setter injection — via setter methods. Good for optional or reconfigurable dependencies.
Field injection — @Autowired directly on the field. Concise but discouraged: you can't make the field final, it hides dependencies, and it can't be set in a plain unit test without reflection.

The senior signal is recommending constructor injection and explaining why — and noting it also surfaces circular-dependency problems at startup instead of hiding them.

How do you handle circular dependency issues in Spring?

A circular dependency is A needs B and B needs A. With constructor injection Spring can't build either (neither can be constructed first) and fails fast at startup with BeanCurrentlyInCreationException — which is actually a good thing, it's telling you the design is wrong.

Options, best to worst:

Refactor — the real fix. A circular dependency usually means a missing third class. Extract the shared logic into a new bean both depend on, breaking the cycle. This is what I'd push for.
Setter / field injection on one side — Spring can create both beans then wire the setter afterward, breaking the construction deadlock.
@Lazy on one dependency — Spring injects a proxy and resolves the real bean on first use.

03 · Software Architecture

Microservices design & patterns

This was the most decisive section on the scorecards — strong candidates moved past naming a pattern into when to use it and what breaks without it. The recurring gap was implementation-level depth on resilience: timeouts, retries, and circuit breakers.

In a microservices system, one service calls another over the network. Calls can be slow or fail intermittently, and a stuck dependency shouldn't bring down the whole system. How do you implement cross-service communication with proper timeout and retry logic?

I'd build defense in three layers and explain the order they fire:

1. Timeouts (the floor). Every network call gets an explicit connect + read timeout. Without one, a hung dependency holds your thread forever and the failure cascades — your thread pool fills, you stop serving healthy requests too. A timeout is the single most important line of defense.
2. Retries with exponential backoff + jitter. For transient failures (a blip, a timeout), retry — but only on idempotent operations, with backoff (100ms, 200ms, 400ms…) and randomized jitter so retries don't synchronize into a self-inflicted DDoS. Cap at 2–3 attempts.
3. Circuit breaker (the bulkhead). If a dependency is failing consistently, retrying just piles on. The circuit breaker tracks the failure rate and, past a threshold, opens — failing fast for a cooldown window instead of waiting on timeouts. After the window it goes half-open, lets a trial request through, and closes again if it succeeds. This is what stops one sick service from dragging down its callers.

In Spring I'd implement all three with Resilience4j (annotations @TimeLimiter, @Retry, @CircuitBreaker) plus a fallback method that returns cached/default data so the user sees graceful degradation, not a 500. Pair it with a bulkhead (separate thread pool per dependency) so a slow dependency can't exhaust the shared pool.

What is the purpose of the Circuit Breaker pattern, and how do you implement fault tolerance in microservices?

Purpose: prevent cascading failure. Analogy: it's an electrical fuse. If a downstream service is down, naively every caller keeps sending requests that pile up on timeouts, exhaust thread pools, and the failure spreads upstream until the whole system is down. The circuit breaker detects the failure and "trips," failing fast so callers stay healthy and the struggling service gets breathing room to recover.

Three states: Closed (normal, requests flow, failures counted) → trips to Open when failure rate crosses a threshold (requests fail instantly, no call made) → after a timeout moves to Half-Open (a few trial requests; success → Closed, failure → back to Open).

Fault tolerance beyond the breaker: timeouts, retries with backoff, bulkheads (isolate resources per dependency), fallbacks (serve cached/default responses), and rate limiting. In code, Resilience4j wires it up:

@CircuitBreaker(name = "inventory", fallbackMethod = "fallbackStock")
@Retry(name = "inventory")
public Stock getStock(String sku) { return client.call(sku); }

public Stock fallbackStock(String sku, Throwable t) {
    return Stock.unknown(sku); // graceful degradation
}

There are two endpoints, /getDetails and /updateDetails, and traffic is high. What changes would you make to the database/services so the API never shows latency?

The key insight: these two endpoints have opposite profiles — /getDetails is read-heavy, /updateDetails is write-heavy — so I'd stop treating them as one workload. This is a read/write separation (CQRS-lite) story:

Cache the reads. Put /getDetails behind Redis (cache-aside). Most reads never touch the DB — this is the single biggest latency win.
Read replicas. Route /getDetails to read replicas and /updateDetails to the primary. Reads no longer compete with writes for the same box.
Async the writes. If /updateDetails doesn't need a synchronous result, accept it, drop it on a queue (Kafka/SQS), ack immediately, and process it with consumers. The user-facing latency becomes "enqueue time."
Invalidate / update the cache on write so /getDetails doesn't serve stale data after an update — and accept eventual consistency if you're reading from a replica.
Connection pooling + horizontal scaling of stateless service instances behind a load balancer to absorb the concurrency.

So-what: reads get faster by never hitting the DB, writes get faster by not blocking on a response, and the two stop interfering with each other.

What is the SAGA design pattern, and when do you implement it?

A Saga manages a transaction that spans multiple microservices, each with its own database. You can't use a single ACID transaction across services (no shared DB, and distributed 2-phase commit is slow and brittle), so a Saga breaks the work into a sequence of local transactions, where each step publishes an event that triggers the next. If a step fails, the Saga runs compensating transactions to undo the prior steps — semantic rollback, not a DB rollback.

Hotel-booking example: Reserve Room → Charge Payment → Issue Confirmation. If payment fails, the compensating action is "release the room." Two flavors:

Choreography — each service listens for events and reacts. Decentralized, no coordinator; great for a few steps, but the flow becomes hard to follow as it grows.
Orchestration — a central orchestrator tells each service what to do and tracks state. Easier to reason about and monitor; the orchestrator is one more thing to run.

When: any business workflow that must stay consistent across service boundaries — order processing, booking, payments — where you accept eventual consistency in exchange for service autonomy.

What is the CQRS design pattern, and when would you use it?

CQRS = Command Query Responsibility Segregation. You split the model into two: a write side (commands) that handles state changes, and a read side (queries) that handles reads — often backed by different data stores optimized for each. The write model might be a normalized Postgres optimized for integrity; the read model might be a denormalized Elasticsearch or a materialized view optimized for fast queries, kept in sync via events.

When to use it: when reads and writes have wildly different scale or shape — e.g. a system that's read 100× more than it's written, or where the query needs a denormalized view that's expensive to compute on the write model. It pairs naturally with event sourcing and with the read/write-separation answer above.

When NOT to: simple CRUD. CQRS adds eventual consistency between the two sides and real operational complexity — it's over-engineering for a basic app. I'd name that trade-off explicitly; that's the maturity the panel looks for.

What deployment strategy do you follow, and what is blue-green deployment?

Blue-green runs two identical production environments. "Blue" is live and serving all traffic. You deploy the new version to "Green" (idle), smoke-test it in isolation, then flip the load balancer to send all traffic to Green. Blue stays warm as an instant rollback — if Green misbehaves, flip back in seconds. The benefit is zero-downtime releases and a trivial rollback path; the cost is running double the infrastructure during the switch, plus handling DB migrations carefully (they must be backward-compatible since both versions may briefly read the same schema).

I'd contrast it with the two main alternatives to show range — every strategy is really juggling the same three things: blast radius (how many users a bad release hits), rollback speed, and infra cost.

Canary — instead of flipping everyone at once, route a small slice (say 5%) of real production traffic to the new version, watch its metrics (error rate, p99, business KPIs), and ramp up gradually 5% → 25% → 100% only if it stays healthy. Its superpower is the smallest blast radius: a bug hurts 5% of users for a few minutes, not everyone. The cost is it's the slowest rollout and needs real traffic-splitting (a service mesh / smart ingress) plus strong observability to actually read the canary's health.
Rolling — the Kubernetes default. No second environment; you just upgrade your existing instances in small batches (take 2 down, bring them up on the new version, wait for health checks, repeat). The win is no extra infra — you reuse the same servers. The catch is there's no untouched old environment standing by, so a rollback means running another rolling deploy in reverse (minutes, not the seconds blue-green gives you), and because old + new run simultaneously during the rollout, the new version must be backward-compatible (same DB schema, compatible APIs) or the old instances start erroring mid-deploy.

Strategy	Blast radius	Rollback	Extra infra
Blue-Green	Everyone, until you flip back	Instant	High (2× during cutover)
Canary	Tiny (~5%)	Fast (reroute the slice)	Low–moderate
Rolling	Grows as batches flip	Slow (reverse rollout)	None

So the choice comes down to risk tolerance vs budget: need instant rollback and can afford double infra → blue-green; need minimum blast radius on high-stakes user traffic → canary; want zero downtime at no extra cost for routine releases → rolling. They also compose — a common pattern is canary-on-top-of-rolling — and the thread tying all of them together is backward-compatible DB migrations, since every strategy except a clean blue-green cut runs two versions against one database at some point.

Explain a scalable microservice architecture and its components. (Also: design an inventory service — high-level design.)

I'd draw it and narrate each box by what it solves — the panel explicitly rewarded structure (requirements → components → trade-offs) and dinged answers that listed services with "no API gateway, no cache, no DB."

flowchart LR U([① Client]) CDN[② CDN] GW[③ API Gateway
auth · rate-limit · routing] ORD[④ Order svc] INV[④ Inventory svc] PAY[④ Payment svc] CACHE[⑤ Redis] MQ[(⑥ Kafka)] DB[(⑦ Postgres
+ replicas)] OBS[⑧ Observability
logs · traces · metrics] U --> CDN --> GW GW --> ORD & INV & PAY INV --> CACHE ORD --> DB INV --> DB PAY --> DB ORD --> MQ --> INV ORD -.metrics.-> OBS INV -.metrics.-> OBS style U fill:#171d27,stroke:#4dfeee,color:#d4dae5 style CDN fill:#e8743b,stroke:#e8743b,color:#fff style GW fill:#4a90d9,stroke:#4a90d9,color:#fff style ORD fill:#38b265,stroke:#38b265,color:#fff style INV fill:#38b265,stroke:#38b265,color:#fff style PAY fill:#38b265,stroke:#38b265,color:#fff style CACHE fill:#9b72cf,stroke:#9b72cf,color:#fff style MQ fill:#d4a838,stroke:#d4a838,color:#fff style DB fill:#3cbfbf,stroke:#3cbfbf,color:#fff style OBS fill:#ec5d8a,stroke:#ec5d8a,color:#fff

② CDN — caches static + public reads at the edge; without it, every read pays cross-region latency.
③ API Gateway — single entry point: TLS termination, JWT auth, rate limiting, routing. Without it every service re-implements auth and there's no central throttle.
④ Services — each owns its data and deploy cadence (database-per-service). The inventory service owns the stock table; nobody else touches it directly.
⑤ Redis — hot stock lookups, sessions, idempotency keys; without it the DB takes 10× the read load.
⑥ Kafka — async backbone so the order service doesn't block on inventory; without it, a slow downstream topples the chain.
⑦ Postgres + replicas — system of record, writes on primary, reads on replicas.
⑧ Observability — centralized logs (ELK), distributed tracing (Zipkin/Jaeger), metrics + alerting (Prometheus/Grafana). The scorecard flagged that strong answers proactively mention alerting on P99 latency, not just dashboards.

Naming the boxes isn't enough — the panel also probed what travels along each arrow. Trace one customer, Riya, buying the last pair of shoes, and narrate every hop:

① Client → ② CDN — Riya's phone hits the nearest edge node, not your origin. If it's a static/public read (image, product JSON), the CDN answers from its edge cache and the request dies here — ~95% of read traffic never goes further. Without this hop every read pays full cross-region latency.
② CDN → ③ API Gateway — only dynamic calls (a write, live stock) the CDN can't answer get forwarded. The request carries Riya's JWT; the gateway terminates TLS → validates the token → checks the rate limit → routes by path. One place for auth + throttling so no service re-implements it.
③ API Gateway → ④ Order svc — GW --> ORD & INV & PAY means it routes to whichever service owns the path, not all three. For "Buy" → POST /orders to Order svc. The gateway only ever calls a service's public API — never its DB.
④ Inventory svc → ⑤ Redis — before reserving, Inventory asks the cache first (GET stock:shoes-123, sub-millisecond). Redis also holds idempotency keys, so Riya's double-tap on "Buy" is caught as a duplicate. Without it the DB takes ~10× the read load.
④ Order / Inventory / Payment → ⑦ Postgres — each service writes its own tables (database-per-service). Writes hit the primary; reads (history, reports) hit replicas so reporting doesn't slow live writes. The cache is fast but volatile — Postgres is the durable truth.
④ Order → ⑥ Kafka → ④ Inventory — the arrow most people draw wrong. Order does not call Inventory over HTTP. It commits the order, publishes OrderCreated to Kafka, and instantly returns "200 OK" to Riya. Inventory is a consumer that pulls the event on its own schedule and reserves stock. So if Inventory is slow or down, Riya's checkout still succeeds and the event waits in the queue — and the queue absorbs Black-Friday spikes. The price is eventual consistency (brief window where the order exists but stock isn't reserved yet → see the outbox / idempotency Q below).
④ Order / Inventory ⤏ ⑧ Observability — the dotted arrows are deliberately different: fire-and-forget logs, trace spans (a trace ID following Riya across every hop), and metrics. Dotted because if observability is down, the order must still succeed — telemetry never blocks business logic. When checkout latency spikes, the trace ID tells you which hop was slow.

The whole flow in one breath: Riya taps Buy → CDN can't answer a write so forwards to the Gateway → Gateway authenticates & routes to Order svc → Order writes to Postgres, publishes OrderCreated to Kafka, and instantly tells Riya "Order placed" → meanwhile Inventory svc consumes that event, checks Redis for stock, reserves it, writes the new count to Postgres → every hop fires a trace span into Observability. The tie-together line: solid arrows are the synchronous path Riya waits on; the Kafka & Observability arrows are async — that's how the system stays fast and debuggable without making the customer wait for everything.

Close with scaling levers: stateless services behind a load balancer + autoscaling (Kubernetes), DB read replicas then sharding, and the next bottleneck named out loud.

04 · Event-Driven Architecture

Event-driven architecture & Kafka

EDA was where candidates generally scored highest — the panel wanted the consistency and reliability patterns: the transactional outbox, ordering guarantees, idempotent consumers, and what to do with poison messages.

What is EDA, when should you implement it, and why?

Event-Driven Architecture is a style where services communicate by producing and consuming events ("OrderPlaced", "PaymentCaptured") through a message broker (Kafka, SQS, RabbitMQ), instead of calling each other synchronously over HTTP. The producer fires an event and moves on; consumers react independently.

Why / when:

Decoupling — the order service doesn't know or care who consumes "OrderPlaced." You can add a new consumer (analytics, notifications) without touching the producer.
Resilience — if a consumer is down, events queue up and are processed when it recovers; in a synchronous call that downstream outage would fail the whole request.
Scalability / load-leveling — the queue absorbs traffic spikes; consumers drain at their own pace instead of being overwhelmed.
Async work — anything that doesn't need to be in the request path (email, invoices, shipping labels) belongs here.

When not to: when you need an immediate synchronous answer (e.g. "is this card valid right now?"), or for simple flows where the added complexity (eventual consistency, debugging across async hops, duplicate handling) isn't worth it.

Your company is moving from a synchronous REST order-processing system to event-driven. Order placement, payment, inventory reservation, and shipping must operate independently but maintain a consistent order lifecycle. How do you design it?

Design it as a Saga over Kafka — each service does its local transaction and emits an event that drives the next step, with compensations for failure.

sequenceDiagram actor User participant Order as Order Service participant Bus as Kafka participant Pay as Payment participant Inv as Inventory participant Ship as Shipping User->>Order: Place order Order->>Bus: OrderCreated Bus->>Pay: consume OrderCreated Pay->>Bus: PaymentCaptured Bus->>Inv: consume PaymentCaptured Inv->>Bus: InventoryReserved Bus->>Ship: consume InventoryReserved Ship->>Bus: Shipped Note over Pay,Inv: On failure → emit
compensating event
(PaymentRefunded / OrderCancelled)

Order service creates the order in PENDING and emits OrderCreated. It owns the lifecycle state and advances it as events come back.
Payment consumes OrderCreated, charges, emits PaymentCaptured (or PaymentFailed).
Inventory consumes PaymentCaptured, reserves stock, emits InventoryReserved (or OutOfStock → compensate: refund payment).
Shipping consumes InventoryReserved, emits Shipped; order moves to COMPLETED.

Three things that make it actually consistent (and that the scorecard wanted to hear): (1) the transactional outbox so the DB write and the event publish can't diverge; (2) idempotent consumers keyed on event ID so a redelivery doesn't double-charge; (3) compensating transactions for each step so a late failure unwinds cleanly. I'd choose orchestration if the flow is long enough that tracing choreographed events gets painful.

A Spring Boot service updates a record in PostgreSQL and then must publish a "Success Event" to Kafka. How do you guarantee the event is sent only if the DB transaction commits, and that it isn't lost if the Kafka broker is temporarily down?

This is the dual-write problem, and the answer is the Transactional Outbox pattern — the candidates who named it scored 8/10 here.

The trap: if you write to the DB and then publish to Kafka as two separate operations, they can diverge. If the DB commits but Kafka is down → event lost. If Kafka publishes but the DB rolls back → phantom event. You can't wrap two different systems in one atomic transaction reliably.

The fix: turn two writes into one. In the same local DB transaction, write the business record and insert the event into an outbox table. Because it's one transaction, they commit or roll back together — the event exists if and only if the data changed.

@Transactional
public void placeOrder(Order o) {
    orderRepo.save(o);                          // business write
    outboxRepo.save(new OutboxEvent("OrderPlaced", o)); // same txn
}                                            // both commit atomically

A separate relay process then reads unpublished rows from the outbox and pushes them to Kafka, marking them sent. If Kafka is down, the rows simply stay in the outbox and the relay retries — nothing is lost. The relay is implemented either by polling the table or, better, with Change Data Capture (Debezium) tailing the Postgres WAL. Consumers must be idempotent because the relay guarantees at-least-once delivery.

Your inventory service must react to order creation asynchronously. (1) How do you design the producer–consumer flow? (2) How do you prevent duplicate event processing? (3) How do you guarantee ordering? (4) What if the consumer crashes after processing but before committing the offset?

(1) Producer–consumer flow: the order service produces an OrderCreated event to a Kafka topic; the inventory service is a consumer in a consumer group that reads it and reserves stock. Use the outbox pattern on the producer side so the event reflects a committed order.

(2) Prevent duplicates (idempotency): Kafka is at-least-once, so duplicates will happen. Make the consumer idempotent — give every event a unique ID and track processed IDs (a processed_events table or a Redis set). Before processing, check "have I seen this ID?"; if yes, skip. Alternatively make the operation itself naturally idempotent (e.g. set stock to an absolute value rather than decrementing).

(3) Guarantee ordering: Kafka only guarantees order within a partition. So I partition by a key that must stay ordered — here orderId (or productId). All events for the same order go to the same partition and are consumed in order. There's no global ordering across partitions, and that's a deliberate trade for parallelism.

(4) Consumer crashes after processing, before committing offset: on restart, the consumer re-reads from the last committed offset and reprocesses that event — which is exactly why (2) matters. The combination is "process the work, then commit the offset, and make processing idempotent" so the inevitable reprocessing is harmless. For stronger guarantees you can use Kafka transactions (exactly-once semantics) tying the processing write and the offset commit together, but idempotent consumer + manual offset commit after processing covers the vast majority of cases.

How do you maintain message order in Kafka, and how do you implement it?

The rule: Kafka preserves order only within a single partition, not across a topic. So to keep related messages ordered, you must route them to the same partition — and Kafka does that by hashing the message key. Same key → same partition → ordered.

// all events for an order keep their relative order
producer.send(new ProducerRecord<>("orders", order.getId(), event));
                                  //        ^ key = orderId

Two implementation gotchas worth naming: (1) set max.in.flight.requests.per.connection=1 (or enable the idempotent producer) so retries don't reorder messages; (2) ordering means events for one key are serialized to one partition, so a hot key limits parallelism — you trade throughput for order. If you need both, you pick a key granularity that's ordered enough (per-order) while still spreading load across partitions (many orders).

A malformed event keeps crashing the consumer. How do you handle this?

This is the poison pill problem: a bad message that can't be processed, so the consumer fails, retries the same message, fails again, and is stuck — blocking the whole partition and never advancing.

The fix is a Dead Letter Queue (DLQ):

Wrap processing in try/catch. On a transient error, retry with backoff a bounded number of times (e.g. 3).
On a non-recoverable error (malformed payload, deserialization failure) — or after retries are exhausted — publish the bad message to a dedicated DLQ topic and commit the offset so the consumer moves on. The partition is unblocked; healthy messages keep flowing.
The DLQ is then inspected, alerted on, fixed, and optionally re-driven. In Spring Kafka this is a few lines: a DefaultErrorHandler + DeadLetterPublishingRecoverer.

Also harden the deserializer (Spring's ErrorHandlingDeserializer) so a single un-parseable record doesn't throw before your handler even runs. So-what: one bad apple should cost you one message in a DLQ, not your whole consumer.

05 · Java & DSA

Java & DSA coding

This was the section that most often pulled scores down — several candidates had the right approach for LRU but couldn't finish the pointer handling. The coding answers below are complete and compile; the concept questions are answered tightly.

Design an in-memory cache supporting get(key) and put(key, value), evicting the Least Recently Used item at capacity, with O(1) for both operations.

The winning data structure is a HashMap + doubly-linked list. The map gives O(1) lookup; the linked list maintains recency order — most-recently-used at the head, least-recently-used at the tail. On every access I unlink the node and move it to the head; eviction just drops the tail. The doubly-linked part is essential: it lets me unlink any node in O(1) (a singly-linked list would be O(n) to find the previous node).

class LRUCache {
  class Node { int key, val; Node prev, next;
    Node(int k, int v){ key=k; val=v; } }

  private final int capacity;
  private final Map<Integer,Node> map = new HashMap<>();
  private final Node head, tail;          // dummy sentinels

  LRUCache(int capacity){
    this.capacity = capacity;
    head = new Node(0,0); tail = new Node(0,0);
    head.next = tail; tail.prev = head;
  }
  private void remove(Node n){ n.prev.next = n.next; n.next.prev = n.prev; }
  private void addFront(Node n){
    n.next = head.next; n.prev = head;
    head.next.prev = n; head.next = n;
  }
  public int get(int key){
    if(!map.containsKey(key)) return -1;
    Node n = map.get(key);
    remove(n); addFront(n);          // mark most-recently-used
    return n.val;
  }
  public void put(int key, int value){
    if(map.containsKey(key)){
      Node n = map.get(key); n.val = value;
      remove(n); addFront(n);
    } else {
      if(map.size() == capacity){
        Node lru = tail.prev;       // evict from tail
        remove(lru); map.remove(lru.key);
      }
      Node n = new Node(key, value);
      addFront(n); map.put(key, n);
    }
  }
}

The two sentinel nodes (head/tail) are the trick that kills the edge cases — there's never a null prev/next to special-case. For a thread-safe production version I'd guard the methods, or just use Collections.synchronizedMap(new LinkedHashMap<>(cap, 0.75f, true){ removeEldestEntry... }) which gives LRU out of the box.

In a high-reliability payment API, a user clicks "Pay," their internet drops, and they click "Pay" again — now you have two identical requests. How do you ensure the user is charged only once?

This is idempotency. The client generates a unique idempotency key (a UUID) for the logical payment attempt and sends it on both retries (e.g. an Idempotency-Key header). The server uses that key to recognize the duplicate.

On the first request, the server atomically records the key (e.g. INSERT with a unique constraint, or Redis SET key NX), processes the charge, and stores the response against the key.
On the retry with the same key, the unique constraint / SETNX fails — the server detects "already processed," skips the charge, and returns the stored response. The user sees success once, is charged once.

The critical detail: the dedup check and the charge must be effectively atomic, and the key must be stored before or with the charge — otherwise two near-simultaneous requests both pass the check. A unique DB constraint on the idempotency key is the simplest correct backstop because the database enforces the atomicity for you.

Why is Executors.newFixedThreadPool(n) sometimes considered dangerous in high-load production? Explain back pressure.

Because newFixedThreadPool(n) is backed by an unbounded LinkedBlockingQueue. When tasks arrive faster than n threads can finish them, they pile up in that queue with no limit. Under sustained high load the queue grows until you hit OutOfMemoryError and the JVM dies — and worse, latency balloons silently long before that because tasks sit in the queue for ages. There's no signal to the producer that the system is overwhelmed.

Back pressure is the missing piece: a mechanism for a slow consumer to tell a fast producer "slow down, I can't keep up." The fix is a bounded queue plus an explicit rejection policy:

new ThreadPoolExecutor(core, max, keepAlive, SECONDS,
    new ArrayBlockingQueue<>(1000),        // bounded!
    new ThreadPoolExecutor.CallerRunsPolicy()); // back pressure

With CallerRunsPolicy, when the queue is full the submitting thread runs the task itself — which naturally slows the producer down (it can't submit more while busy). Other policies: AbortPolicy (throw — shed load), DiscardPolicy, etc. The principle: bound your queues and decide explicitly what happens at the limit, rather than letting an unbounded queue absorb load until it kills the process.

Rotate an array to the right by k steps. e.g. [1,2,3,4,5], k=2 → [4,5,1,2,3].

The elegant O(n) time, O(1) space trick is the three reversals: reverse the whole array, then reverse the first k, then reverse the rest. (First normalize k % n.)

void rotate(int[] a, int k){
  int n = a.length; k %= n;
  reverse(a, 0, n-1);     // [5,4,3,2,1]
  reverse(a, 0, k-1);     // [4,5,3,2,1]
  reverse(a, k, n-1);     // [4,5,1,2,3]
}
void reverse(int[] a, int i, int j){
  while(i < j){ int t=a[i]; a[i++]=a[j]; a[j--]=t; }
}

Why it works: the last k elements need to wrap to the front; reversing everything brings them to the front (but backwards), and the two sub-reversals fix the order of each half.

Coin Change — given coin denominations and an amount, return the fewest coins to make that amount, or −1 if impossible. (Infinite supply of each coin.)

This is a classic unbounded knapsack / bottom-up DP. dp[i] = fewest coins to make amount i. Initialize to "infinity," dp[0]=0, and for each amount try every coin.

int coinChange(int[] coins, int amount){
  int[] dp = new int[amount+1];
  Arrays.fill(dp, amount+1);   // "infinity"
  dp[0] = 0;
  for(int i=1; i<=amount; i++)
    for(int c : coins)
      if(c <= i) dp[i] = Math.min(dp[i], dp[i-c]+1);
  return dp[amount] > amount ? -1 : dp[amount];
}

O(amount × coins) time. The interviewer who scored this 8/10 also accepted a BFS framing — treat each amount as a node, each coin as an edge of weight 1, BFS from 0; the first time you reach amount is the fewest coins. Both are correct; DP is the more standard answer.

Find the first non-repeating character in a string, e.g. "aabbcdde" → c.

Two passes with a count map (insertion-ordered so I can return the first):

Character firstNonRepeating(String s){
  Map<Character,Integer> cnt = new LinkedHashMap<>();
  for(char c : s.toCharArray())
    cnt.merge(c, 1, Integer::sum);
  for(var e : cnt.entrySet())
    if(e.getValue() == 1) return e.getKey();
  return null;
}

O(n) time. LinkedHashMap preserves insertion order so the second pass naturally yields the first unique character. For "aabbcdde": a,b,d repeat; c and e are unique; c comes first → c.

Check whether two strings are anagrams.

Two clean approaches. Sort both and compare (O(n log n)), or — better — count characters (O(n)):

boolean isAnagram(String a, String b){
  if(a.length() != b.length()) return false;
  int[] freq = new int[26];
  for(int i=0; i<a.length(); i++){
    freq[a.charAt(i)-'a']++;
    freq[b.charAt(i)-'a']--;
  }
  for(int f : freq) if(f != 0) return false;
  return true;
}

Increment for one string, decrement for the other; if every count nets to zero they're anagrams. (Use a HashMap instead of int[26] for Unicode.)

What are the advantages and disadvantages of parallel streams, and when should you use them?

What they are: stream().parallel() splits the data and processes chunks across multiple threads using the shared ForkJoinPool.commonPool.

Advantages: on large datasets with CPU-bound, independent work, they use all cores and can give a real speed-up for free.

Disadvantages / dangers (the scorecard flagged that strong answers know the fork-join internals):

They use the common pool shared by the whole JVM — one slow parallel stream can starve everything else. Blocking/IO work in a parallel stream is especially bad: it ties up common-pool threads.
Overhead of splitting + merging often makes them slower than a sequential stream for small datasets.
Order-sensitive or stateful/shared-mutable operations break or need synchronization, killing the benefit and risking race conditions.

When to use: large dataset (rough rule: 10k+ elements), CPU-bound, stateless, associative operations, and you've measured a benefit. Otherwise default to sequential. For IO-bound parallelism, use a dedicated executor, not parallel streams.

Difference between the Factory and Strategy design patterns? Explain with a real-time example.

Both use polymorphism, but they answer different questions.

Factory (creational) — "which object do I create?" It centralizes object creation so the caller doesn't use new directly. Example: a PaymentFactory.create("UPI") returns a UpiPayment, create("CARD") returns a CardPayment. The factory's job ends once the object is made.
Strategy (behavioral) — "which algorithm do I run?" It lets you swap an interchangeable behavior at runtime. Example: a ShippingCostCalculator that takes a PricingStrategy — StandardPricing, SurgePricing, WeekendPricing — and you inject whichever you need; the calculator delegates to it.

The crisp distinction: Factory decides what to instantiate; Strategy decides how to behave. They compose well — a factory often creates the strategy object you then plug in.

Define the SOLID principles with real examples.

S — Single Responsibility. A class has one reason to change. A UserService shouldn't also format emails and write to disk — split those out.
O — Open/Closed. Open for extension, closed for modification. Add a new payment type by adding a class, not editing a giant switch. (Strategy pattern embodies this.)
L — Liskov Substitution. A subclass must be usable wherever its parent is, without surprises. The classic violation: Square extends Rectangle breaks code that sets width and height independently.
I — Interface Segregation. Many small focused interfaces beat one fat one. Don't force a SimplePrinter to implement fax() and scan() it'll never use.
D — Dependency Inversion. Depend on abstractions, not concretions. Inject a PaymentGateway interface, not a concrete StripeGateway — this is literally what Spring's DI gives you.

So-what: SOLID is the toolkit for keeping code changeable. The two that show up most in interviews are O (extend without editing) and D (program to interfaces) — both directly enabled by Spring's IoC container.

What are the trade-offs of switching from 4 separate REST calls (property → room types → rate plans → prices) to GraphQL?

The pain GraphQL solves here is real: nested data over REST means multiple round trips (the N+1 / under-fetching problem) or over-fetching fat payloads. GraphQL lets the client ask for the exact nested shape in one request, getting property + room types + rate plans + prices in a single round trip with no extra fields.

Advantages: one round trip, client-specified fields (no over/under-fetching), a strongly-typed schema, and easy evolution (add fields without versioning).

Trade-offs to name (this is what scores):

Caching is harder — REST gets free HTTP/CDN caching on GET URLs; GraphQL is typically one POST to /graphql, so you need client-side or field-level caching instead.
Server complexity — resolvers can reintroduce N+1 queries against the DB unless you add batching (DataLoader).
Expensive/abusive queries — a client can request deeply nested data; you need query depth/complexity limits.
Tooling/monitoring — standard HTTP status-code monitoring doesn't map cleanly (GraphQL often returns 200 with an errors array).

Verdict: for a read-heavy, deeply-nested UI like this, GraphQL is a good fit — but I'd weigh the lost CDN caching, since property data is also a great caching candidate.

06 · Database Knowledge

Databases & SQL

The DB questions split into scaling/optimization theory and hands-on SQL. The theory answers rewarded depth — indexing and EXPLAIN and read replicas and the trade-offs between replication and sharding.

A large-scale system's database has grown over time; queries that once performed well are now slow. How do you approach database optimization?

I'd work outside-in, cheapest-fix-first:

1. Measure before touching anything. Turn on the slow-query log, find the worst offenders, and run EXPLAIN / EXPLAIN ANALYZE on them. Optimizing blind is the classic mistake.
2. Indexing. The most common culprit is a missing index causing a full table scan. Add indexes on columns in WHERE, JOIN, and ORDER BY — and use composite and covering indexes for hot queries. (But not too many: each index slows writes.)
3. Query rewriting. Eliminate SELECT *, avoid functions on indexed columns (kills index use), fix N+1 patterns, paginate with keyset instead of huge OFFSET.
4. Caching. Put Redis in front of hot read queries so they never hit the DB.
5. Scale the architecture. Read replicas to offload reads; partitioning (e.g. by date) so queries hit a smaller slice; archive cold data out of the hot table; and ultimately sharding for write scale.

The maturity signal the scorecard wanted: tie it to monitoring and real production data — "I'd look at the slow-query log and p99 over the last week," not just recite the list.

Explain the difference between Read Replicas and Database Sharding. If your problem is a high volume of INSERTs and UPDATEs, which one would you choose and why?

Read Replicas — copies of the database that stay in sync via replication. They scale reads: you route SELECTs to replicas, writes still go to the single primary. Simple to add, but they do nothing for write scaling — every write still bottlenecks on the one primary — and replicas are eventually consistent (replication lag).

Sharding — horizontally partitioning the data across multiple databases by a shard key (e.g. user_id). Each shard holds a subset and handles its own reads and writes. This scales writes because the write load is spread across many primaries.

For high INSERT/UPDATE volume: sharding. Read replicas can't help — the write bottleneck is the single primary, and adding replicas just adds more copies fed by that same overloaded primary. Sharding splits the writes across N shards, so each handles 1/N of the write traffic.

I'd name the cost honestly (the scorecard flagged this as the missing piece): sharding adds real complexity — choosing a good shard key to avoid hot spots, cross-shard queries and joins become hard, transactions across shards need a Saga, and resharding/rebalancing is painful. So I'd shard only when a single beefy primary genuinely can't keep up with writes.

This query takes 10 seconds on a 5-million-row table. How do you fix it?

SELECT order_id, customer_name, total_amount FROM orders WHERE status='COMPLETED' AND order_date > '2023-01-01' ORDER BY total_amount DESC LIMIT 10;

First, EXPLAIN it — I'd bet it's a full table scan plus a filesort. The fix is the right index. The query filters on status + order_date and sorts by total_amount, so a composite index that serves the filter and the sort:

CREATE INDEX idx_orders_status_amount
  ON orders (status, total_amount DESC, order_date);

Reasoning: lead with the equality predicate (status) so the index narrows to COMPLETED rows, then total_amount DESC lets the DB read the top rows already sorted — it can stop after 10 (no filesort), which is the whole game with LIMIT 10. order_date is a range predicate so it goes last (range columns can't be followed by useful sort columns).

To make it a covering index (so it never touches the table), I could include order_id and customer_name (INCLUDE in Postgres). Other levers if the table keeps growing: partition by order_date, or archive old completed orders out of the hot table.

Given orders(id, user_id, amount, created_at) and users(id, country), write a query for the top 5 countries by revenue in the last 30 days.

Join, filter the date window, group by country, sum, sort, limit:

SELECT   u.country,
         SUM(o.amount) AS revenue
FROM     orders o
JOIN     users  u ON u.id = o.user_id
WHERE    o.created_at >= CURRENT_DATE - INTERVAL '30 days'
GROUP BY u.country
ORDER BY revenue DESC
LIMIT    5;

The bug the scorecard saw candidates hit: putting the date condition in HAVING instead of WHERE, or forgetting that everything not aggregated must be in GROUP BY. Filter rows in WHERE (before grouping), aggregate, then sort by the aggregate.

Write a query to find the third-highest salaried employee.

The robust, duplicate-safe way uses DENSE_RANK() (so two people tied for #1 don't push #3 off):

SELECT name, salary
FROM (
  SELECT name, salary,
         DENSE_RANK() OVER (ORDER BY salary DESC) AS rnk
  FROM employees
) t
WHERE rnk = 3;

The simpler-but-flawed version is ORDER BY salary DESC LIMIT 1 OFFSET 2 — fine if salaries are unique, but it returns the wrong row when there are ties. DENSE_RANK handles "third-highest distinct salary" correctly, which is what the question really means.

Given employees(emp_id, emp_name, manager_id, salary), find employees earning more than their manager.

A self-join — join the table to itself, employee row to manager row, then compare salaries:

SELECT e.emp_name AS employee, e.salary, m.emp_name AS manager
FROM   employees e
JOIN   employees m ON e.manager_id = m.emp_id
WHERE  e.salary > m.salary;

The mental model: alias the same table twice — e is the employee, m is their manager (matched by e.manager_id = m.emp_id) — then the WHERE keeps only the rows where the employee out-earns the manager.

How is MongoDB different from SQL (relational) databases?

Dimension	SQL (e.g. Postgres)	MongoDB (NoSQL)
Data model	Tables, rows, fixed schema	Collections of flexible JSON-like documents
Schema	Rigid, enforced, migrations needed	Schema-less / flexible per document
Relationships	Joins across normalized tables	Embedded documents / denormalized; no native joins (limited `$lookup`)
Scaling	Primarily vertical; sharding is bolt-on	Built for horizontal sharding
Transactions	Strong ACID, mature	ACID added later, single-doc strong; multi-doc available but heavier
Best for	Structured data, complex queries, strong consistency (payments, ledgers)	Evolving/unstructured data, high write throughput, document-shaped reads (catalogs, profiles)

The senior framing: it's not "better vs worse," it's "do I need relational integrity and complex joins (SQL), or flexible schema and horizontal scale (MongoDB)?" Many systems use both — Postgres for the transactional core, MongoDB for the flexible document parts.

07 · Communication

Communication & articulation

The "good-to-have" skill — but it appeared in every interview, and it's where structure beats jargon. The panel wanted to see you simplify for a non-technical audience and stay calm under a production fire.

You're leading a critical production deployment and an unexpected issue occurs after the release. You need to communicate the situation to non-technical stakeholders while managing expectations and next steps. How do you explain it?

I'd use a tight four-part structure and lead with impact, not internals — stakeholders care about the business effect, not the stack trace.

1. State the impact in their language. "Since the 2 p.m. release, roughly 15% of customers can't complete checkout. We're aware and actively on it." No jargon — translate "the payment service is throwing 500s" into "some customers can't pay."
2. Reassure with action + scope. What we're doing right now, and what's not affected ("existing orders and data are safe; this only affects new checkouts"). Bounding the problem calms people.
3. Set a realistic timeline with a buffer. "We expect a fix within the hour; I'll send an update in 30 minutes regardless." Adding buffer and committing to a next update time manages expectations — the scorecard specifically rewarded this.
4. Be honest and follow up. If we decide to roll back, say so plainly. Afterward, a short blameless post-mortem on root cause and prevention.

The meta-skill: under a production fire, stakeholders mostly need to feel that someone competent is in control and communicating. Calm, regular, jargon-free updates do that.

How do you handle technical debt in Agile teams while still maintaining delivery pace?

The framing I'd give: tech debt isn't a cleanup project you do "someday" — it's managed continuously, like financial debt, by paying it down a bit each sprint.

Make it visible. Track debt items in the same backlog as features, not a hidden TODO list. You can't prioritize what you can't see.
Prioritize by impact × risk. Not all debt is equal — debt in a hot, frequently-changed, high-risk area gets paid first; debt in stable code that nobody touches can wait. This impact/priority framing is exactly what the scorecard credited.
Budget a fixed slice. Allocate ~15–20% of each sprint to debt/refactoring so it's continuous, never a big-bang rewrite that blocks delivery.
Boy-scout rule + definition of done. Leave code cleaner than you found it; bake quality (tests, review) into "done" so new debt accrues slower.
Speak business language to stakeholders. Frame paydown as "this refactor cuts our bug rate and speeds up the next three features," not "the code is ugly." That's how you get buy-in to spend the time.

So-what: you maintain pace precisely by servicing debt steadily — unmanaged debt is what eventually grinds delivery to a halt.