INDEX
    Explanations

    sampling methodology

    New Auto-Interp
    Negative Logits
    -0.09
     FAG
    -0.09
    -0.09
    jate
    -0.08
     compagnon
    -0.08
     Ари
    -0.08
     Fiat
    -0.08
    -Spiel
    -0.08
     Aziz
    -0.08
     lada
    -0.08
    POSITIVE LOGITS
     protest
    0.07
     pumps
    0.07
     Hmm
    0.07
    esp
    0.07
     disturbances
    0.07
     people
    0.07
     flower
    0.07
    empo
    0.07
    0.07
    Hmm
    0.07
    Act Density 0.028%

    No Known Activations