INDEX
    Explanations

    phrases indicating blame or accusation

    New Auto-Interp
    Negative Logits
    LOCAL
    -0.07
    ANGES
    -0.06
    pellier
    -0.06
    ruk
    -0.06
     LOCAL
    -0.06
     лок
    -0.06
    hev
    -0.06
    roj
    -0.06
    DNA
    -0.06
     Local
    -0.06
    POSITIVE LOGITS
    ugins
    0.08
    ugin
    0.08
    ington
    0.07
    dfa
    0.07
    olini
    0.06
    metis
    0.06
     Revolutionary
    0.06
     verb
    0.06
    æ°
    0.06
     NEC
    0.06
    Act Density 0.000%

    No Known Activations