INDEX
Explanations
words related to disagreements or conflicts
New Auto-Interp
Negative Logits
emale
-0.80
éĸ
-0.72
ãĥĺãĥ©
-0.70
ãĥķãĤ©
-0.68
undai
-0.67
uilt
-0.67
URA
-0.65
srf
-0.65
SEA
-0.64
eele
-0.64
POSITIVE LOGITS
ABOUT
1.07
about
1.05
nonsense
0.97
spew
0.88
aloud
0.87
regarding
0.86
antics
0.83
endlessly
0.82
hyster
0.80
concerning
0.80
Activations Density 0.200%