INDEX
    Explanations

    ones with non-zero activation values

    elements related to coding or programming concepts

    New Auto-Interp
    Negative Logits
     eleph
    -0.92
    hement
    -0.73
    ÃĥÃĤÃĥÃĤ
    -0.70
     newcom
    -0.69
     pione
    -0.67
    aditional
    -0.66
     occas
    -0.66
     proport
    -0.64
     Burnett
    -0.64
     undermin
    -0.62
    POSITIVE LOGITS
    Requirements
    0.98
    Testing
    0.93
    0.92
    Async
    0.91
    ³³³
    0.91
    github
    0.90
    Installation
    0.88
    Deploy
    0.84
    package
    0.84
    Usage
    0.84
    Act Density 0.266%

    No Known Activations