INDEX
    Explanations

    statements of blame or self-recrimination

    New Auto-Interp
    Negative Logits
     oprot
    -0.88
    Iné
    -0.68
     ―――――
    -0.66
    NUMX
    -0.66
     незавершена
    -0.65
     Houſe
    -0.65
    cknow
    -0.65
     Monfieur
    -0.65
     NDEBUG
    -0.65
     itſelf
    -0.64
    POSITIVE LOGITS
    !
    0.57
    !!!
    0.55
    !!
    0.53
    !!!!
    0.52
    !”
    0.50
    !"
    0.48
     !
    0.47
    ufen
    0.44
     inspiração
    0.44
     C
    0.44
    Act Density 0.193%

    No Known Activations