INDEX
Explanations
words related to grievances or complaints
New Auto-Interp
Negative Logits
aqu
-0.19
eut
-0.15
tains
-0.15
xfa
-0.15
nero
-0.14
Hab
-0.14
onso
-0.14
izard
-0.14
aç
-0.14
habit
-0.14
POSITIVE LOGITS
ovsky
0.17
asje
0.16
Giang
0.16
ÑĤвеÑĢд
0.15
ÑĤеÑĢн
0.15
.mob
0.14
stell
0.14
اÙī
0.14
igo
0.14
.bias
0.14
Activations Density 0.018%