NPM Supply Chain Network Analysis and Criticality Mapping

Mapping systemic risks in the NPM ecosystem through topology-independent analysis methods.

Centralized package managers like NPM have transformed the software ecosystem into a complex and fragile structure. Current security approaches often fail to detect systemic risks stemming from the network's architecture. This study aims to map these risks using topological analysis methods independent of package content.

We constructed a directed graph modeling the top 1,000 packages by dependents (infrastructure) and popularity, extending dependencies to a depth of 7. By calculating metrics like In-degree, Betweenness, and Inverted Clustering, we developed the Behavioral Risk Score (BRS) to quantify structural criticality.

🔗 Live Preview: yusufarbc.github.io/npm-supply-chain-network-analysis

💡 Key Findings

This study presents critical insights into the topological structure of the NPM ecosystem:

📚 Documentation and Background

For the theoretical foundation of the project and case analyses, please review the following documents:

🚀 Quick Start

Prerequisites

Installation

  1. Clone the repository and navigate to the directory:
    git clone https://github.com/yusufarbc/npm-supply-chain-network-analysis.git
    cd npm-supply-chain-network-analysis
  2. Set up and activate the virtual environment (Windows PowerShell):
    python -m venv .venv
    .\.venv\Scripts\Activate.ps1
  3. Install dependencies:
    pip install -r analysis/requirements.txt
    python -m pip install notebook
  4. Start the analysis:
    python -m notebook
    # Open the analysis/analysis.ipynb file

📊 Usage (Pipeline)

The analysis engine runs through analysis/run_pipeline.py. You can perform a complete analysis by running the first cell in the notebook.

from analysis.run_pipeline import run_pipeline

# Default: Most critical infrastructure packages (Top 1000 Dependents + Depth 7)
result = run_pipeline(
    top_n=1000,                    # Number of packages (per leaderboard category)
    leaderboard_mode="combined",    # Mode: combined (dependents + downloads)
    depth=7,                        # Scanning depth
    results_dir="../results",      # Output directory
    compute_plots=True              # Generate plots
)

Analysis Modes

Mode Parameter Description Use Case
Most Dependent dependents Most depended-upon packages Critical Infrastructure Analysis (Default)
Most Downloaded downloads Most downloaded packages General popularity and traffic analysis
Trending trending Rapidly rising packages Early warning and anomaly detection

📂 Project Structure

📜 License

This project is licensed under the MIT License.