INDEX
    Explanations

    descriptive labels/qualities

    New Auto-Interp
    Negative Logits
     intolerable
    0.46
    素晴らしい
    0.46
     extraordinaria
    0.45
     extraordinary
    0.45
     অসাধারণ
    0.44
     unbearable
    0.42
     horrendous
    0.41
    素晴
    0.41
     horrifying
    0.40
    incredible
    0.40
    POSITIVE LOGITS
     slightly
    1.09
     Slightly
    0.95
    slightly
    0.92
     upbeat
    0.90
     légèrement
    0.86
     playful
    0.83
     approachable
    0.78
     straightforward
    0.75
     breezy
    0.74
     somewhat
    0.73
    Act Density 0.292%

    No Known Activations