INDEX
    Explanations
    New Auto-Interp
    Negative Logits
     Hier
    -0.74
     masks
    -0.72
     Masquerade
    -0.71
     Rabbit
    -0.70
     Span
    -0.69
     Reconstruction
    -0.68
     Barrier
    -0.68
     Mask
    -0.67
     FIG
    -0.65
     Zombies
    -0.64
    POSITIVE LOGITS
    @
    2.20
    gerald
    1.01
    contact
    1.01
    reports
    1.01
    cott
    0.96
    jamin
    0.95
    john
    0.95
    christ
    0.94
    espie
    0.92
    iries
    0.92
    Act Density 0.075%

    No Known Activations