INDEX
    Explanations

    mentions of significant cultural events or figures

    New Auto-Interp
    Negative Logits
    anch
    -0.17
    wheel
    -0.15
     Reds
    -0.15
    628
    -0.14
    hani
    -0.14
     multinational
    -0.14
    ino
    -0.13
     Buckley
    -0.13
    annies
    -0.13
    eth
    -0.13
    POSITIVE LOGITS
    essim
    0.19
    iani
    0.19
    emann
    0.15
    ermann
    0.15
    omba
    0.14
    abbo
    0.14
    лами
    0.14
    -addons
    0.14
    erman
    0.14
    trecht
    0.14
    Act Density 0.664%

    No Known Activations