INDEX
    Explanations

    mentions of groups, communities, or collectives involving people or entities

    New Auto-Interp
    Negative Logits
    -animate
    -0.15
     cons
    -0.15
    andest
    -0.14
    enou
    -0.14
    ppo
    -0.14
    /ec
    -0.14
    iani
    -0.14
    еко
    -0.14
    irst
    -0.13
     pump
    -0.13
    POSITIVE LOGITS
    RITE
    0.19
    lak
    0.17
    aux
    0.16
    tere
    0.16
    ball
    0.15
    both
    0.15
    ussels
    0.15
     Ludwig
    0.15
     Both
    0.14
    ele
    0.14
    Act Density 0.389%

    No Known Activations