Public Services: International comparison of Public Services

Project: Research project (funded)Research

Project Details

Description

Project Background

International comparison has become one of the most influential levers for change in public services. By establishing benchmarks in examining performance, cross-national comparison offers opportunities for countries to assess their place in relation to others; to learn from experience elsewhere; to promote accountability to its citizens and, perhaps most importantly, to systematically evaluate performance. The value placed on comparative information is perhaps best evidenced by the extensive resources invested by national and international organizations in the collection, publication and analyses of cross-national data.

Many of the data that international comparisons have traditionally relied upon have been reported at broad aggregate country level often leading to highly contested and inconclusive results. More recently, there has been a shift in focus towards the use of data measured at an individual level, often derived from administrative records or cross-country surveys. In the context of health systems, measures of performance are becoming increasingly reliant on the perspective of the user where patients’ views and opinions have long been recognized as a legitimate means for assessing the provision of services (Coulter and Magee, 2003).

A fundamental concern for cross-national surveys is the comparability of the data collected. Measures of performance reliant on self-reported data contain specific challenges for comparative analyses. In particular, a critical issue is variation across individuals both within and between countries in the use of survey response categories. Do individuals with different socio-economic or demographic characteristics rate a fixed level of performance differently? Are these differences more pronounced at country level? What methods can be used to adjust self-reported data to produce greater comparability? What impact, if any, does systematic reporting behaviour have on cross-country comparative analysis of health system performance? Drawing on survey data on health systems responsiveness, this project sets out to examine these questions and to offer policymakers and international agencies useful information on the future design and use of cross-country surveys of public sector performance.

Health Systems Responsiveness

The concept of responsiveness as a measure of health systems performance was developed and promoted by the World Health Organisation (WHO). Responsiveness is defined as aspects of the way individuals are treated and the environment in which they are treated during health system interactions (Valentine, 2003). The concept covers a set of non-clinical and non-financial dimensions of quality of care that reflect respect for human rights and interpersonal aspects of the care process (Valentine et al, 2009). These are measured across eight domains chosen to reflect the goals for health care processes and systems valued highly by individuals in their contact with health systems.

Objectives

The challenge of how appropriately to compare across institutional settings and populations is a central feature of cross-country comparative work for all public services. This study aimed to interrogate perhaps the most ambitious cross-country comparative instrument in the domain of public services yet implemented, the World Health Survey (WHS).

The overarching aim of the proposed research was to assess the usefulness for securing international comparisons of public services responsiveness of the hierarchical ordered probit (HOPIT) methods applied to household surveys and anchoring vignettes. Specific objectives were to:

1. Interrogate the WHS data for its suitability for cross-national comparison of public service performance;
2. Assess the use of self-reported measures of responsiveness as an appropriate measure of public service performance;
3. Advance the use of anchoring vignettes to control for differential and systematic reporting of public service responsiveness;
4. Investigate the socio-economic and demographic determinants of public service responsiveness and examine the drivers of reporting heterogeneity in cross-national comparative research on system performance;
5. Assess the strengths and weaknesses of the HOPIT model as a suitable approach to securing comparisons between groups of survey respondents;
6. Make recommendations on the utility of anchoring vignettes used alongside self-reports of public service performance that may inform the design of future cross-national survey data.

Data

In order to address the above research questions, we gained access to data from the World Health Survey (WHS). The WHS is an initiative launched by the WHO in 2001 aimed at strengthening national capacity to monitor critical health outputs and outcomes through the fielding of a valid, reliable and comparable household survey instrument (see Üstün et al., 2003b). In total, seventy countries participated in the WHS 2002-2003. All surveys were drawn from nationally representative frames with known probability resulting in sample sizes of between 600 and 10,000 respondents across the countries surveyed.

Health systems responsiveness and anchoring vignettes

Measures of responsiveness were obtained by asking respondents to rate their most recent experience of contact with the health system within the following eight domains: prompt attention, dignity, communication, autonomy, confidentiality, choice, quality of basic amenities, access to family and community support. For each domain, at most two item questions were asked of respondents. The following five response categories were available to respondents when rating their experience of health systems: “very good”, “good”, “moderate”, “bad”, and “very bad”.

The WHS further contains a number of vignettes describing the experiences of hypothetical individuals within each of the domains. Five vignettes are available for each of the questions per domain. The response scale available to respondents answering the vignettes is the same as the scale available when responding to their own experiences of health system responsiveness.

Socio-demographic variables

We further made use of respondent characteristics contained in the WHS. These include age, gender, level of education and income. Level of education was measured as the number of years of education and as a categorical variable representing educational attainment. Income represented an indirect measure of household permanent income based on a series of questions on ownership of physical assets. We made use of a categorisation of the permanent income variable which indicates the within country tertile (or quintile) of the income distribution in which the measure of permanent income falls. We use these variables as regressors in models of responsiveness across the individual domains considered and also as determinants of reporting behaviour.

Example vignette
Dignity – item: respectful treatment and communication
Respondent own rating:
How would you rate:
1 – being greeted and talked to respectfully?
2 – the respect for privacy during physical examination and treatments?
Example vignette:
[Anya] took her baby to for a vaccination. The nurse said hello, but did not ask for [Anya’s] or the baby’s name. The nurse also examined [Anya] and made her remove her shirt in the waiting room.
Q1: How would you rate her experience of being greeted and talked to respectfully?
Q2: How would you rate the way her privacy was respected during physical examination and treatments?

Communication – item: clarity of communication
Respondent own rating:
How would you rate:
1 – how clearly health care providers explained things to you?
2 – the time you get to ask questions about your health problems or treatment?
Example vignette:
[Rose] cannot write or read. She went to the doctor because she was feeling dizzy. The doctor didn’t have time to answer her questions or to explain anything. He sent her away with a piece of paper without telling her what it said.
Q1: How would you rate her experience of how clearly health care providers explained things to her?
Q2: How would you rate her experience of getting enough time to ask questions about her health problem or treatment?

Country-level indicators

We further considered a number of aggregate country-level indicators, including GDP and the health care expenditure as a proportion of GDP. For models comparing performance across countries we also included country-level dummy variables to capture characteristics of a country that might impact on both the reporting of responsiveness and the level of responsiveness itself. This proved more satisfactory than including aggregate country-specific indicators and we focus on the latter in this report.

In order to stratify countries into more homogeneous groups prior to analysis, we grouped countries according to the United Nations Human Development Index (HDI; United Nations Development Programme, 2006).

Methods

Descriptive analysis

To ensure that the WHS is fit for purpose, we first undertook exploratory analysis of the psychometric properties of the responsiveness instrument. This considered the three key desirable properties of a survey instrument: feasibility (ease of administrating an instrument), reliability (test-retest properties of an instrument) and validity (structure of the responsiveness concept). We compared the results for the WHS to the survey’s precursor, the Multi-Country Survey Study which has undergone extensive data scrutiny (MCS Study; Üstün, 2001).

Differential reporting behaviour and anchoring vignettes

Cross-country analyses will fail to achieve comparability if individuals, when faced with an instrument involving ordinal response categories, interpret the meaning of the response categories in a way that systematically differs across populations or populations sub-groups (Sadana et al., 2002). As an example, it is natural to think of good or poor system performance to mean different things to different people, and where for any given objective level of performance individuals may differ in their subjective rating. This differential mapping from the underlying latent construct of interest (objective performance) to the available response categories is a source of differential reporting behaviour. For example, two individuals may face the same level of performance indicated by the vertical line, but rate this differently. This is due to the use of a different set of thresholds dividing the response categories. A casual inspection of the ratings would suggest that one of the two individuals faces poorer health system performance compared to the other individual. The analogy extends to comparisons of performance across countries.

Faced with respondent self-reports of underlying performance, adjusting for differences in the location of the thresholds is fundamental to achieving comparative analysis across individuals and countries. Our approach makes use of anchoring vignettes to explore the characteristic of variation in reporting behaviour. Since vignettes describe fixed and pre-determined levels of performance, systematic variation across individuals in the rating of the vignettes can be ascribed to differences in reporting. The information in the vignettes allows us to model the response scales, or thresholds as a function of the characteristics of respondents.

Knowledge of reporting behaviour is then subsequently used to adjust the self-reported data obtained from survey respondents on their experiences of system performance. Once achieved, we then anchor the adjusted reported behaviour to a common scale (e.g. to reporting behaviour of a benchmark country) to produce more comparative cross-country analysis.

In using the vignette approach, we make use of what has been termed the hierarchical ordered probit (HOPIT) model. The model makes it possible to partition observed differences in self-reported responses into differences due to reporting behaviour and genuine differences in the underlying latent construct under scrutiny. Technical details of the approach are provided in the two nominated outputs. The application of the approach to performance measurement is novel and innovative.

Assumptions

Two assumptions underpin the vignette approach. First response consistency implies that individuals rate the vignettes in a way that is consistent with the rating of their own experiences of health system responsiveness. A second assumption, vignette equivalence, implies that conditional on characteristics that determine reporting behaviour, for each vignette there exists an actual (unobserved) level of responsiveness to which all individuals agree (King et al., 2004). We have tested this latter assumption for the responsiveness module of the WHS.

Results

Our findings illustrate that systematic variation in reporting behaviour exists in the data both across individuals within countries and more prominently across countries. Results indicate that reporting behaviour is related to individual characteristics of income and education and to a lesser extent age and gender. Substantial variation in the reporting of responsiveness is observed across countries, but this is driven, in part, by differences in reporting behaviour. Accounting for differential reporting across countries and anchoring respondents’ ratings of experiences with health services to a common scale markedly influences the ranking of country performance.

Investigation of the underlying assumption of the vignette methodology lends support to the validity of the assumption. Analyses of the WHS data indicate the responsiveness module has face-validity.

Discussion

Data on public sector performance is often categorical and self-reported, giving rise to the possibility of differential reporting behaviour, both within and, more notably, across countries. A reliance on information provided in vignettes offers the possibility of adjusting such data and anchoring to a common scale, thereby affording a more comparable basis on which to undertake international analyses of performance. The ranking of countries using the raw data differs markedly from that obtained after adjusting for variation in reporting behaviour.

Our analyses therefore suggest that adjusting for variation in reporting behaviour is essential when undertaking cross-country comparative analyses based on self-reported survey data. We believe the use of anchoring vignettes can offer valuable insights into the reporting behaviour of individuals and has great utility as a method for promoting comparability in cross-country analyses.

International comparison continues to be a key instrument for initiating policy change. For comparisons to be credible the extent to which individuals from diverse cultural backgrounds differ in their reporting of objective levels of performance needs to be considered. Only after such differences have been accounted for can informative comparative analyses take place.

Impacts

Our research provides methodological and policy insights. From a methodological perspective, our work makes a distinct contribution to the literature on performance measurement where data are self-reported. We have advanced the methodology of anchoring vignettes as a valid survey-based instrument to adjust self-reported data and we have undertaken methodological investigations of the underlying assumptions of the vignettes approach.

From a policy perspective, the results provide national and international organisations and policymakers with a deeper understanding of potential variations in performance of public services across countries. They offer policy analysts a means to adjust and anchor comparative analyses to place them on a more equal footing. In addition the work will inform the design of surveys aimed at establishing the comparative performance of public services.

Layman's description

Project Aims



International comparison of performance has become one of the most influential levers for change in the provision of public services. However, international comparison is intrinsically difficult. Perhaps one of the most challenging areas is the notion of health system `responsiveness’ which we define as the extent to which it represents user preferences in domains such as patient autonomy, choice and quality of amenities and can be seen as an elaboration of the more widely discussed concept of `user satisfaction’.



The main objective of this study was to assess the usefulness for securing international comparisons of public service responsiveness of methods of anchoring vignettes applied to household survey data of individual self-reported experiences of contact with health services. In doing so we addressed the following main issues:



1. We interrogated individual-level data on health service responsiveness to investigate cross-national comparison of health system performance.

2. We examined the degree to which individuals interpret the meaning of survey response categories in a way that systematically differs across populations and population sub-groups.

3. We explored the use of anchoring vignettes as a means to adjust survey reports of health system performance for differential reporting behaviour across countries and their ability to produce more comparable cross-country analyses.



Data



Perhaps the most ambitious attempt to date to measure and compare health systems responsiveness is the World Health Survey (WHS). The WHS is an initiative launched by the WHO in 2001 aimed at strengthening national capacity to monitor critical health outputs and outcomes through the fielding of a valid, reliable and comparable household survey instrument. Seventy-countries participated in the WHS with samples drawn from nationally representative frames resulting in sample sizes of between 600 and 10,000 respondents per country. Our analysis exploited the survey module on responsiveness, which is measured across eight domains and a set of anchoring vignettes contained in the WHS. Vignettes offer descriptions of the experiences of hypothetical individuals when accessing health services covering the above domains.



Methods



It is natural to think of good or poor performance to mean different things to different people. Accordingly, for any given objective level of performance individuals may differ in their subjective ratings. This is a source of differential reporting behaviour and is a phenomenon likely to be particularly pronounced when comparing performance across countries.



Our approach to analyses drew on information contained within respondent reports of the hypothetical vignettes to explore the characteristics of systematic reporting behaviour across individuals both within and across countries. Once identified, this information was subsequently used to adjust respondents’ self-reports of experiences with health services to produce ratings purged of differential reporting behaviour. By anchoring reporting behaviour to a common scale, independent of country, this allowed us to produce more comparable cross-country rankings of health system responsiveness.

Key findings

Results



We cannot cover all results here – details can be found in the full report on the ESRC website.



Our findings illustrate that systematic variation in reporting behaviour exists in the data both across individuals within countries and more prominently across countries. Results indicate that reporting behaviour is related to individual characteristics of income and education and to a lesser extent age and gender. Substantial variation in the reporting of responsiveness is observed across countries, but this is driven, in part, by differences in reporting behaviour. Accounting for differential reporting across countries and anchoring respondents’ ratings of experiences with health services to a common scale markedly influences the ranking of country performance.



Investigation of the underlying assumption of the vignette methodology lends support to the validity of the assumption. Analyses of the WHS data indicate the responsiveness module has face-validity.



Discussion



Data on public sector performance is often categorical and self-reported, giving rise to the possibility of differential reporting behaviour, both within and, more notably, across countries. A reliance on information provided in vignettes offers the possibility of adjusting such data and anchoring to a common scale, thereby affording a more comparable basis on which to undertake international analyses of performance. The ranking of countries using the raw data differs markedly from that obtained after adjusting for variation in reporting behaviour.



Our analyses therefore suggest that adjusting for variation in reporting behaviour is essential when undertaking cross-country comparative analyses based on self-reported survey data. We believe the use of anchoring vignettes can offer valuable insights into the reporting behaviour of individuals and has great utility as a method for promoting comparability in cross-country analyses.



International comparison continues to be a key instrument for initiating policy change. For comparisons to be credible the extent to which individuals from diverse cultural backgrounds differ in their reporting of objective levels of performance needs to be considered. Only after such differences have been accounted for can informative comparative analyses take place.



Impacts



Our research provides methodological and policy insights. From a methodological perspective, our work makes a distinct contribution to the literature on performance measurement where data are self-reported. We have advanced the methodology of anchoring vignettes as a valid survey-based instrument to adjust self-reported data and we have undertaken methodological investigations of the underlying assumptions of the vignettes approach.



From a policy perspective, the results provide national and international organisations and policymakers with a deeper understanding of potential variations in performance of public services across countries. They offer policy analysts a means to adjust and anchor comparative analyses to place them on a more equal footing. In addition the work will inform the design of surveys aimed at establishing the comparative performance of public services.



Dissemination activities



We have already been involved in dissemination activities of research results to academic audiences (e.g. conferences, presentations, working papers, journal article, book chapter) and policy audiences (e.g. World Health Organization, Department of Health). In addition to the peer-reviewed journal and book chapter accepted for publication, planned dissemination activities include the production of two further peer-reviewed journal articles, a “policy briefing” document for distribution to regulatory and policy-making agencies and further presentations at policy and academic forums.

StatusFinished
Effective start/end date15/11/0614/06/09

Funding

  • ECONOMIC AND SOCIAL RESEARCH COUNCIL (ESRC): £287,222.00

Keywords

  • H Social Sciences (General)
  • Reporting behaviour
  • Vignettes
  • HA Statistics
  • Hierarchical Ordered Probit Model
  • Health systems' performance