Computational and mathematical modelling has become a valuable tool for investigating biological systems. Modelling enables prediction of how biological components interact to deliver system-level properties and extrapolation of biological system performance to contexts and experimental conditions where this is unknown. A model's value hinges on knowing that it faithfully represents the biology under the contexts of use, or clearly ascertaining otherwise and thus motivating further model refinement. These qualities are evaluated through calibration, typically formulated as identifying model parameter values that align model and biological behaviours as measured through a metric applied to both. Calibration is critical to modelling but is often underappreciated. A failure to appropriately calibrate risks unrepresentative models that generate erroneous insights. Here, we review a suite of strategies to more rigorously challenge a model's representation of a biological system. All are motivated by features of biological systems, and illustrative examples are drawn from the modelling literature. We examine the calibration of a model against distributions of biological behaviours or outcomes, not only average values. We argue for calibration even where model parameter values are experimentally ascertained. We explore how single metrics can be non-distinguishing for complex systems, with multiple-component dynamic and interaction configurations giving rise to the same metric output. Under these conditions, calibration is insufficiently constraining and the model non-identifiable: multiple solutions to the calibration problem exist. We draw an analogy to curve fitting and argue that calibrating a biological model against a single experiment or context is akin to curve fitting against a single data point. Though useful for communicating model results, we explore how metrics that quantify heavily emergent properties may not be suitable for use in calibration. Lastly, we consider the role of sensitivity and uncertainty analysis in calibration and the interpretation of model results. Our goal in this manuscript is to encourage a deeper consideration of calibration, and how to increase its capacity to either deliver faithful models or demonstrate them otherwise.