INDEX
    Explanations

    phrases indicating feelings of guilt or accusations of wrongdoing

    New Auto-Interp
    Negative Logits
    TRACE
    -0.08
    erli
    -0.08
     Trace
    -0.08
    bay
    -0.08
    æĤł
    -0.07
     лож
    -0.07
    ë³
    -0.07
    è½
    -0.07
    Trace
    -0.07
     ÑĥлÑĥÑĩ
    -0.07
    POSITIVE LOGITS
     catch
    0.06
     both
    0.06
    islav
    0.06
    ilver
    0.06
     definition
    0.06
    catch
    0.06
    mith
    0.06
    cher
    0.06
     revis
    0.06
    opyright
    0.05
    Act Density 0.002%

    No Known Activations