Building a Distributed Search Engine: HTTP Networking Layer in Go

February 2, 2026

Building a Distributed Search Engine: HTTP Networking Layer in Go

In the previous section, we built the TF-IDF algorithm for ranking documents. Now it's time to make our search engine distributed. In this post, we'll build the HTTP networking layer that enables communication between nodes.

By the end, you'll have a reusable HTTP server and client that can:

  • Handle concurrent requests using goroutines
  • Validate HTTP methods (POST only for task endpoints)
  • Send tasks to multiple workers in parallel
  • Gracefully handle failures

Architecture Overview

Our distributed search cluster has two types of nodes:

┌─────────────────┐ HTTP POST ┌─────────────────┐
│ Coordinator │ ─────────────────► │ Worker 1 │
│ /search │ │ /task │
└─────────────────┘ └─────────────────┘
│ HTTP POST ┌─────────────────┐
└────────────────────────────► │ Worker 2 │
│ /task │
└─────────────────┘
  • Coordinator: Receives search queries, splits work, aggregates results
  • Workers: Process document subsets, return term frequencies

Both need HTTP servers. The coordinator also needs an HTTP client to talk to workers.

The RequestHandler Interface

First, let's define a clean interface that both coordinator and worker will implement:

package networking
// RequestHandler defines the interface for handling HTTP requests.
type RequestHandler interface {
// HandleRequest processes a request payload and returns a response.
HandleRequest(payload []byte) []byte
// GetEndpoint returns the HTTP endpoint path for this handler.
// For example, "/search" for coordinator or "/task" for workers.
GetEndpoint() string
}

This interface-based design gives us:

  • Testability: Easy to mock handlers in tests
  • Flexibility: Swap implementations without changing server code
  • Clean separation: Server handles HTTP, handler handles business logic

Building the WebServer

Our server needs to:

  1. Expose a /status health check endpoint
  2. Route requests to the handler's endpoint
  3. Only accept POST requests on the handler endpoint
  4. Handle concurrent requests with goroutines
package networking
import (
"context"
"fmt"
"io"
"log"
"net/http"
"sync"
"time"
)
type WebServer struct {
port int
handler RequestHandler
server *http.Server
mu sync.Mutex
running bool
}
func NewWebServer(port int, handler RequestHandler) *WebServer {
return &WebServer{
port: port,
handler: handler,
}
}

Starting the Server

func (ws *WebServer) Start() error {
ws.mu.Lock()
if ws.running {
ws.mu.Unlock()
return fmt.Errorf("server already running")
}
mux := http.NewServeMux()
// Health check endpoint
mux.HandleFunc("/status", ws.handleStatus)
// Dynamic endpoint based on handler
if ws.handler != nil {
endpoint := ws.handler.GetEndpoint()
mux.HandleFunc(endpoint, ws.handleRequest)
log.Printf("Registered endpoint: %s", endpoint)
}
ws.server = &http.Server{
Addr: fmt.Sprintf(":%d", ws.port),
Handler: mux,
ReadTimeout: 30 * time.Second,
WriteTimeout: 30 * time.Second,
}
ws.running = true
ws.mu.Unlock()
log.Printf("Starting HTTP server on port %d", ws.port)
return ws.server.ListenAndServe()
}

Handling Requests

The key insight: we only accept POST requests on the handler endpoint. This is a deliberate design choice—search tasks carry payloads that don't fit in query strings.

func (ws *WebServer) handleRequest(w http.ResponseWriter, r *http.Request) {
// Only accept POST requests
if r.Method != http.MethodPost {
http.Error(w, "Method not allowed. Use POST.", http.StatusMethodNotAllowed)
return
}
// Read request body
body, err := io.ReadAll(r.Body)
if err != nil {
http.Error(w, "Failed to read request body", http.StatusBadRequest)
return
}
defer r.Body.Close()
// Process through handler
response := ws.handler.HandleRequest(body)
// Send response
w.Header().Set("Content-Type", "application/octet-stream")
w.WriteHeader(http.StatusOK)
w.Write(response)
}

Graceful Shutdown

Production servers need clean shutdown to avoid dropping in-flight requests:

func (ws *WebServer) Stop(ctx context.Context) error {
ws.mu.Lock()
defer ws.mu.Unlock()
if !ws.running || ws.server == nil {
return nil
}
log.Printf("Stopping HTTP server on port %d", ws.port)
ws.running = false
return ws.server.Shutdown(ctx)
}

The Shutdown method waits for active connections to complete before returning.

Building the WebClient

The coordinator needs to send tasks to workers concurrently. Let's build a client that:

  1. Sends POST requests with binary payloads
  2. Handles multiple workers in parallel
  3. Gracefully handles failures (no crashes!)
package networking
import (
"bytes"
"context"
"io"
"net/http"
"sync"
"time"
"github.com/UnplugCharger/distributed_doc_search/internal/model"
)
type WebClient struct {
client *http.Client
}
func NewWebClient() *WebClient {
return &WebClient{
client: &http.Client{
Timeout: 30 * time.Second,
},
}
}

Sending a Single Task

func (wc *WebClient) SendTask(ctx context.Context, url string, payload []byte) (*model.Result, error) {
req, err := http.NewRequestWithContext(ctx, http.MethodPost, url, bytes.NewReader(payload))
if err != nil {
return nil, err
}
req.Header.Set("Content-Type", "application/octet-stream")
resp, err := wc.client.Do(req)
if err != nil {
return nil, err
}
defer resp.Body.Close()
if resp.StatusCode != http.StatusOK {
return nil, &HTTPError{StatusCode: resp.StatusCode, URL: url}
}
body, err := io.ReadAll(resp.Body)
if err != nil {
return nil, err
}
return model.DeserializeResult(body)
}

Concurrent Task Distribution

This is where Go shines. We use goroutines and a mutex to safely collect results:

func (wc *WebClient) SendTasksConcurrently(
ctx context.Context,
workers []string,
tasks []*model.Task,
) []*model.Result {
if len(workers) == 0 || len(tasks) == 0 {
return nil
}
numTasks := min(len(workers), len(tasks))
results := make([]*model.Result, numTasks)
var wg sync.WaitGroup
var mu sync.Mutex
for i := 0; i < numTasks; i++ {
wg.Add(1)
go func(index int, workerURL string, task *model.Task) {
defer wg.Done()
payload, err := model.Serialize(task)
if err != nil {
log.Printf("Failed to serialize task: %v", err)
return
}
result, err := wc.SendTask(ctx, workerURL, payload)
if err != nil {
log.Printf("Worker %s failed: %v", workerURL, err)
return // Graceful failure - don't crash!
}
mu.Lock()
results[index] = result
mu.Unlock()
}(i, workers[i], tasks[i])
}
wg.Wait()
return results
}

Key design decisions:

  • Graceful failure: If one worker fails, we log it and continue. The coordinator can work with partial results.
  • Index preservation: Results stay in order matching the input workers/tasks.
  • Context support: Callers can cancel all requests via context.

Property-Based Testing

We use property-based tests to verify our HTTP layer behaves correctly for any input.

Property 11: HTTP Method Validation

func TestHTTPMethodValidation(t *testing.T) {
properties := gopter.NewProperties(gopter.DefaultTestParameters())
// POST requests should be processed
properties.Property("POST requests are processed by handler", prop.ForAll(
func(payload string) bool {
handler := NewMockHandler("/task", []byte("response"))
// ... setup test server ...
resp, _ := http.Post(ts.URL+"/task", "application/octet-stream",
bytes.NewReader([]byte(payload)))
return resp.StatusCode == http.StatusOK &&
handler.GetRequestCount() == 1
},
gen.AlphaString(),
))
// Non-POST requests should be rejected
properties.Property("non-POST requests are rejected", prop.ForAll(
func(method string) bool {
if method == http.MethodPost {
return true // Skip POST
}
handler := NewMockHandler("/task", []byte("response"))
// ... setup test server ...
req, _ := http.NewRequest(method, ts.URL+"/task", nil)
resp, _ := client.Do(req)
return resp.StatusCode == http.StatusMethodNotAllowed &&
handler.GetRequestCount() == 0
},
gen.OneConstOf("GET", "PUT", "DELETE", "PATCH", "HEAD", "OPTIONS"),
))
properties.TestingRun(t)
}

Property 12: Concurrent Request Independence

func TestConcurrentRequestIndependence(t *testing.T) {
properties := gopter.NewProperties(gopter.DefaultTestParameters())
properties.Property("concurrent requests are processed independently", prop.ForAll(
func(numRequests int) bool {
numRequests = clamp(numRequests, 1, 20)
processedRequests := make(map[int]bool)
var mu sync.Mutex
// Setup server that tracks request IDs
mux := http.NewServeMux()
mux.HandleFunc("/task", func(w http.ResponseWriter, r *http.Request) {
body, _ := io.ReadAll(r.Body)
var requestID int
fmt.Sscanf(string(body), "request-%d", &requestID)
mu.Lock()
processedRequests[requestID] = true
mu.Unlock()
time.Sleep(5 * time.Millisecond) // Simulate work
w.Write([]byte(fmt.Sprintf("response-%d", requestID)))
})
ts := httptest.NewServer(mux)
defer ts.Close()
// Send concurrent requests
var wg sync.WaitGroup
responses := make([]bool, numRequests)
for i := 0; i < numRequests; i++ {
wg.Add(1)
go func(id int) {
defer wg.Done()
resp, _ := http.Post(ts.URL+"/task", "text/plain",
bytes.NewReader([]byte(fmt.Sprintf("request-%d", id))))
body, _ := io.ReadAll(resp.Body)
responses[id] = string(body) == fmt.Sprintf("response-%d", id)
}(i)
}
wg.Wait()
// Verify all requests processed with correct responses
return len(processedRequests) == numRequests &&
allTrue(responses)
},
gen.IntRange(1, 20),
))
properties.TestingRun(t)
}

This test verifies that:

  • All N concurrent requests are processed
  • Each request gets its own unique response
  • No requests block each other

Test Results

=== RUN TestHTTPMethodValidation
+ POST requests are processed by handler: OK, passed 100 tests.
+ non-POST requests are rejected: OK, passed 100 tests.
--- PASS: TestHTTPMethodValidation (0.07s)
=== RUN TestConcurrentRequestIndependence
+ concurrent requests are processed independently: OK, passed 100 tests.
--- PASS: TestConcurrentRequestIndependence (0.96s)

Key Takeaways

  1. Interface-based design: RequestHandler decouples HTTP handling from business logic
  2. Graceful failure: Workers can fail without crashing the coordinator
  3. Concurrent by default: Go's goroutines make parallel requests trivial
  4. Property-based testing: Catches edge cases in concurrent code that unit tests miss

What's Next?

In Part 3, we'll build the SearchWorker and SearchCoordinator that implement the RequestHandler interface. The coordinator will:

  • Split documents among workers
  • Send tasks concurrently using our WebClient
  • Aggregate results and calculate final TF-IDF scores

Get the Code

git clone git@github.com:UnplugCharger/distributed_doc_search.git
cd distributed-search-cluster-go
go test ./test/property/... -v

This post is part of the "Distributed Document Search" series. Follow along as we build a production-ready search cluster from scratch.