INDEX
Explanations
phrases referring to specific groups of individuals and their actions or characteristics
phrases that typically start with "some people" indicating opinions or behaviors of individuals
New Auto-Interp
Negative Logits
=~
-0.86
ãĤ´
-0.84
enegger
-0.80
è£ıç
-0.71
ĸļ
-0.71
Anyone
-0.69
--------------------------------------------------------
-0.69
-+
-0.68
Anyone
-0.67
————————
-0.66
POSITIVE LOGITS
hops
0.70
rooms
0.70
downright
0.69
creep
0.68
hop
0.66
wiser
0.66
outright
0.64
worse
0.63
incorrectly
0.63
lim
0.63
Activations Density 0.310%