INDEX
Explanations
words related to extreme negativity or criticism
the repeated use of the word "utter" in various contexts
New Auto-Interp
Negative Logits
llah
-0.75
actionDate
-0.75
amsung
-0.75
xual
-0.74
OHN
-0.72
pei
-0.69
deen
-0.68
Feder
-0.67
ysc
-0.66
dule
-0.66
POSITIVE LOGITS
ances
1.16
ance
0.99
most
0.89
aneous
0.83
TY
0.80
utter
0.79
amaz
0.75
ables
0.73
ing
0.73
ancing
0.73
Activations Density 0.016%