Chandra’s Distributed Peer Review

Rodolfo Montez Jr. and the Chandra Director’s Office

In Cycle 27, the Chandra Peer Review transitioned to a distributed peer review (DPR) model. In a DPR model, all proposing teams are required to review a subset of proposals submitted to the call for proposals, and the resulting reviews are used to inform final proposal selections. The DPR model has been trialed by some funding agencies and is established practice for a few astronomical facilities (e.g., European Southern Observatory and ALMA). Recent findings reported by the Research on Research Institute highlight the numerous benefits of and essential components needed to implement the DPR model. This article describes Chandra's transition to the DPR model along with some outcomes from Cycle 27.

Why Now?

The Chandra Peer Review has undergone a variety of changes over the years as new challenges have arisen. Reviews through 2019 were held in-person (some readers may even remember week-long stays at Boston hotels with physical disks and printouts being ferried through the halls), but then the global pandemic expedited a transition to fully virtual reviews. A new challenge that has come with such virtual reviews has been maintaining flexibility while adhering to nominal operating hours. In one cycle, the composition and availability of panel members necessitated a meeting schedule that was both overnight for accompanying CXC staff and separated by weeks from other panels. The CXC has always been eager to meet challenges with thoughtful consideration for reviewers' time. We strongly believe the peer review process is essential for developing a highly impactful and effective Chandra science program cycle after cycle.

Further insight can be found in the historical reviewer and principal investigator (PI) information from Cycles 1–24. In that time, ~2900 unique PIs submitted ~15,500 proposals to the Peer Review. However, digging into the reviewer information, we made a somewhat startling realization: over those twenty-four cycles, only ~950 PIs participated in at least one peer review, with another ~240 reviewers coming from the population of those who had not yet been a PI on a Chandra proposal. But the most startling revelation we found was that in the first twenty-four cycles nearly 2000 PIs NEVER participated in the Chandra Peer Review. That is two-thirds of the total number of unique PIs of Chandra proposals.

It is not controversial to say Chandra PIs are the most qualified scientists to review other Chandra proposals. Yet based on historical information, we are not benefitting from this experience as much as we could be. In addition, given that our typical 8-person panel could have something in the range of 40–60 proposals to review, the effort provided by those ~1200 unique reviewers is substantial and laudable.

The low number of reviewing PIs is not for lack of trying. Every cycle a team of CXC scientists, referred to as panel organizers, have spent their Winter and Spring months identifying and inviting potential reviewers for the Chandra Peer Review. Success rates for these reviewer invitations typically come in at the 20–30% range. We aim to secure a total of 100 reviewers each cycle, so the time and effort put in by panel organizers is substantial in the face of these low success rates.

Switching to DPR addresses the challenge of maximizing reviewer experience and proposer participation, while reducing the burden of the recruitment effort by flipping the process. Proposing teams are required to participate in the review, introducing more experience and participants while eliminating the need for recruitment campaigns. Reviewers also only see a small number of proposals for a longer period of time, thereby reducing the demand on each person’s time on both ends of the process.

Chandra's DPR Approach

The Chandra DPR approach is fairly straightforward with only a few institution-specific nuances. Each proposing team has a representative who scores and performs initial reviews of a proposal set (10–16 proposals) within a set timeframe. They then have a shorter second round, during which time they can then view all of the other anonymized reviews provided on their proposal set. This second round gives reviewers an opportunity to adjust their proposal evaluations. The second round DPR scores and reviews are used to inform final proposal rankings and recommendations.

All proposals submitted to the CfP undergo the distributed review process; however, the Target of Opportunity (TOO), Large Program (LP), and Very Large Program (VLP) proposals undergo additional review by traditional panels. Prior to the panels' meetings, panelists are provided with the DPR evaluation results as a part of the process by which they arrive at final recommendations for the selection official.

Who Reviews Proposals?

Each proposing team has one Designated Reviewer (DR) who is responsible for performing the proposal reviews by the specified deadline. The DR role has the following parameters:

While there is no limit to how many proposals a PI may submit, there is a limit on how many proposals an individual can review. If an individual appears as a DR on more than three proposals, the CXC reserves the right to ask the proposing team(s) to identify a new DR. This limitation is necessary since we do not intend to overload reviewers, even when a PI submits a larger number of proposals. We believe that if an individual has to review too many proposals, the quality of their reviews may be reduced. As a result, the maximum number any individual person will be required to review is sixteen. Limiting the number of proposals in which an individual can be selected as DR helps us maintain this minimal impact on reviewers and maintain a quality review.

CXC Distributed Review Proposal Access

The Designated Reviewer accesses and reviews proposals through the Chandra Distributed Review Site (CDRS), which requires a Chandra User account (see CXC Account) and completion of a non-disclosure agreement. At the site, reviewers are presented with a list of proposals they are required to review. In the initial week of assignments, reviewers are asked to immediately report any potential conflicts that were not picked up by our conflict review prior to assignments. As part of our assignment process, we use the reviewer's scientific area (based on proposal and publication record) to identify proposals they might be able to review. While some proposals might be outside of a reviewer’s area, we optimize the matching to limit the number of such assignments while also balancing the ideal matches and reviewer load.

Reviewer assignment begins with determining a reviewer suitability/similarity for each proposal. We optimize the reviewer–proposal similarity matrix to give every proposal a similar spectrum of suitable reviewers while mitigating identifiable conflicts of interest. We then assign reviewers, ensuring that each proposal must meet a threshold of ten reviews but also that individual unique designated reviewers have a cap of sixteen reviews.

In Cycle 27, during the first Chandra distributed review we had 233 unique Designated Reviewers reviewing 299 proposals.Based on a comparison of the first and second round scores, we surmised that a small number of reviewers (2–3) reversed the value of the scores, as they essentially flipped them in the second round. We also found that 70% of the reviewers changed at least one score in the second round.

Two bar graphs. In the first, Number of Score Comparisons is plotted against Reviewer Uncertainty, stretching from zero to four thousand and zero to five, respectively. Two color schemes identify Round 1 and Round 2 data. Both data sets peak highly at zero (approximately 3500), before dropping to 2000 at uncertainty 1 and a sharp cliff just after uncertainty 2 to near zero levels. A small shift to lower values can be seen between the two rounds. The second figure shows Number of Proposals against Average Reviewer Uncertainty per Proposal, ranging zero to eighty and zero to two, respectively. Both are uni-modal distributions skewed to have a longer tail to higher uncertainty. Round 1 peaks above 0.75, while round 2 peaks just below, and the distribution to Round 2 is both smaller in central moment and less skewed.


Figure 1: Reviewer Uncertainty Measurements from Cycle 27 Chandra Distributed Peer Review. Reviewer uncertainty is determined through comparisons of reviewer scores for a given proposal.

Reviewer Uncertainty is a measurement used to evaluate agreement across the reviewers' scores. The measurement compares scores for a given proposal pair-wise and calculates the absolute separation between the paired scores. Since scores in the Chandra DPR range from 0 (worst) to 5 (best), the maximum value for Reviewer Uncertainty is 5 (most uncertain value, i.e., one reviewer believing a proposal is infeasible to one reviewer believing a proposal is the best) and the minimum is 0 (least uncertain value, i.e., two reviewers agreeing on the score, regardless of its qualitative value). We determined the Reviewer Uncertainty for all the proposals in DPR and also the average Reviewer Uncertainty per Proposal in Round 1 and Round 2 scores. Histograms of the Reviewer Uncertainty and average Reviewer Uncertainty per Proposal are presented in Figure 1. The plots indicate a relatively low reviewer uncertainty with a mean value of ~0.9. There is marginal evidence that reviewer uncertainty decreased between Round 1 and 2, but the reduction is quite small (0.87 to 0.83; a 5% change) and the influence of those reviewers who may have reversed their scores does have an impact on the Reviewer Uncertainty.

Determining Selections

Regardless of the process, participants in the Chandra Peer Review do not select the proposals to award time themselves but rather make recommendations to the Selection Official, the Director of the Chandra X-ray Observatory, currently Pat Slane. The Selection Official uses the recommendations while making final selections that balance scientific merit, subject demand, technical feasibility, and other available resources (including Resource Cost, TOO triggers, and High Ecliptic Latitude Time). For the first DPR, all non-V/LP/TOO proposals were categorized into five main categories that mirror the historical categories used over the decades of past Chandra Peer Reviews: (1) Stars, (Exo-)Planets, and Solar System Objects, (2) Blackholes, Neutron Stars, and Binary Systems, (3) Supernovae, Supernova Remnants, and Isolated Neutron Stars, (4) Galaxy Clusters and Diffuse (Extra-)Galactic Emission, and (5) Active Galactic Nuclei. The demand in each panel was determined by the total requested time, and a preliminary allocation was made for each category based on that demand.

The aggregate DPR scores were then used to provide an initial ranking of the proposals in each category. We used a variety of metrics to determine where the initial proposal ranking was potentially dubious (effectively to determine where closely scored proposals were effectively tied). For these proposals, we studied the individual scores in more detail to carefully evaluate the ranking of statistically tied proposals and considered the ties in decision trees when finalizing selections. Typically, the proposals in the first quintile are clearly separated from the others and fall within the category's allocation. In the second quintile, the precise ranking was more important and used by the Selection Official to select the successful proposals.

To the Future

After the initial rollout in Cycle 27, the CXC plans to continue using the distributed peer review for Chandra's annual call for proposals. Overall, the criteria and procedures described in the CfP for the Chandra DPR have not changed, but we are making some adjustments. With an initial run of the distributed peer review model, we have identified pressure points where the timeline should be adjusted. Our communication with Designated Reviewers will be improved to avoid confusion in the scoring and to help improve report content. The matching software we use to pair reviewers and proposals has been improved and will likely continue to improve with every iteration. There will now be four scoring categories instead of one: Scientific Merit, Use of Chandra Capabilities, Technical Justification, and Clarity of Proposal. We believe these additional scoring categories will allow reviewers to provide more specific feedback to proposers. Our hope is that the distributed peer review will help the efficiency of the Chandra review process and build and strengthen the Chandra User community.