Avoid Propagating Invalid Data

I’ve previously written on what kind of return values we should consider acceptable. Today I would like to expand on this topic.

One anti-pattern that I see frequently goes a bit like this: A data retrieval class, say, CustomerService, returns Customer objects for one or more of its methods. Let’s assume that for these Customer objects to be usable by callers must have a valid, non-null organisation identifier—OrgId.

That’s the scenario.

What is the problem?

The problem occurs when CustomerService neglects to do Customer data validity filtering, and delegates this task to the callers of CustomerService.

We might see code similar to this listing:

  var customers = CustomerService.SearchCustomers(criteria);

  foreach (var customer is customers)
  {
     if (customer.OrgId != null)
     {
        // Customer is valid as has OrgId
        // Do something useful with customer.
     }
  }

Because CustomerService did not filter out invalid data, all its clients have to. A better design manages validity problems closer to the source.

If we leave invalid data to propagate throughout the program, no data can be trusted and then validity checking must happen everywhere. That’s why we check input data at system boundaries.

It’s helpful to keep in mind that this is only an anti-pattern for obviously invalid data within a given context. If Customer objects are allowed to miss an OrgId in some scenarios, then the point is moot. However, we can still do better by providing more tailored CustomerService methods: Instead of SearchCustomers() we could have SearchOrganisationCustomers(), which would furnish only Customers holding valid OrgId values.

My recommendation is to test for data validity closer to the source, reducing or eliminating a fanout of downstream data filtering and checking.