Methodology: Topological Risk Analysis and BRS Model

1. Data Collection and Network Modeling

The project models the NPM ecosystem as a directed graph:

Sampling Strategy

Instead of analyzing the entire NPM ecosystem (3M+ packages), a Combined Sampling Strategy was followed to capture both infrastructural backbone and popular end-user packages:

  1. Seed Set: The top 1,000 packages by Dependents (infrastructure) AND the top 1,000 packages by Downloads (popularity) were selected from ecosyste.ms.
  2. Merging: These lists were combined, resulting in ~1,452 unique seed packages.
  3. Depth Scanning: The dependency tree of these seeds was scanned up to depth level 7.
  4. Preprocessing: Circular dependencies were cleaned and isolated nodes were filtered out.

As a result, a directed graph of 1,844 nodes and 3,814 edges (after full traversal) was constructed.

2. Centrality Metrics

Three fundamental metrics were used to determine the importance of packages within the network:

Metric Definition Risk Context Meaning
In-Degree The count of incoming dependencies. Direct Impact Radius: Indicates popularity and the immediate number of projects affected by a vulnerability ($w=0.30$).
Out-Degree The count of outgoing dependencies. Attack Surface: A high out-degree suggests an expanded surface for upstream attacks ($w=0.10$).
Betweenness Frequency of the node appearing on shortest paths between others. Bridge Role: Quantifies the package's role as a network "bridge" controlling information and risk flow ($w=0.35$).
Inverted Clustering $1 - \text{ClusteringCoefficient}$ Local Fragility: Captures structural holes. Nodes bridging distinct clusters without redundant connections pose higher risk ($w=0.15$).

3. Behavioral Risk Score (BRS) v3

A single metric is insufficient to express complex supply chain risks. Therefore, a Behavioral Risk Score (BRS) was developed through weighted combination of structural and popularity metrics.

Normalization

Each metric is normalized to the [0,1] range using the Min-Max method. To focus on orders of magnitude rather than raw counts, skewed metrics (Dependents, Downloads, Staleness) are Log-Normalized before scaling:

x' = [ln(1+x) - min(ln(1+x))] / [max(ln(1+x)) - min(ln(1+x))]

Formula (Structural Model)

The v3 model prioritizes strategic position (Betweenness) and impact radius (In-Degree), while accounting for local fragility (Inverted Clustering).

BRS = 0.35 ยท Betweenness' + 0.30 ยท In-Degree'
+ 0.15 ยท ClusteringInv' + 0.10 ยท Out-Degree'
+ 0.05 ยท Dependents' + 0.05 ยท Downloads'

Weight Rationale

4. Robustness and Cascade Analysis

The model's validity was tested through targeted attack simulations: