INDEX
Explanations
negative or critical sentiments and expressions related to personal or societal issues
New Auto-Interp
Negative Logits
TagMode
-0.65
-0.60
utafitiHapana
-0.59
EconPapers
-0.57
fjspx
-0.55
>",
-0.54
:✨
-0.53
HasForeignKey
-0.53
surla
-0.53
Cares
-0.53
POSITIVE LOGITS
fucking
0.77
damn
0.73
🤬
0.66
fucking
0.66
FUCKING
0.64
worthless
0.62
😡
0.62
terrified
0.62
damned
0.60
dangerous
0.59
Activations Density 2.151%