INDEX
    Explanations

    statements expressing honesty or frankness

    New Auto-Interp
    Negative Logits
    eynman
    -0.65
     réfugiés
    -0.64
     برانيه
    -0.64
    AndWait
    -0.63
    fpm
    -0.63
    aguya
    -0.63
    :✨
    -0.60
     порядка
    -0.59
    льше
    -0.59
     bénévoles
    -0.59
    POSITIVE LOGITS
    Frankly
    0.96
     frankly
    0.95
    Honestly
    0.92
     Honestly
    0.83
     honestly
    0.83
     disambiguazione
    0.71
    说实话
    0.68
    honestly
    0.64
     admit
    0.61
     tbh
    0.60
    Act Density 0.110%

    No Known Activations