INDEX
    Explanations

    phrases related to accusations and social discourse

    New Auto-Interp
    Negative Logits
    oral
    -0.18
    plit
    -0.16
    inh
    -0.15
    oren
    -0.15
    uta
    -0.14
    oker
    -0.14
    olas
    -0.13
     mand
    -0.13
     Finn
    -0.13
    zag
    -0.13
    POSITIVE LOGITS
     Berger
    0.16
    گاÙĨ
    0.15
    argo
    0.15
    jest
    0.15
    ampaign
    0.14
    иÑģÑĤÑĢа
    0.14
    ارت
    0.14
    addon
    0.14
     Khal
    0.14
    ereco
    0.13
    Act Density 0.120%

    No Known Activations