INDEX
Explanations
phrases related to general statements or observations about people
phrases indicating generalizations about people
New Auto-Interp
Negative Logits
)].
-0.76
Skydragon
-0.74
SetTextColor
-0.73
cloth
-0.69
iang
-0.64
="/
-0.64
Void
-0.64
Demonic
-0.64
inders
-0.62
cule
-0.61
POSITIVE LOGITS
anymore
0.74
asio
0.70
YP
0.67
irlf
0.64
answ
0.61
appe
0.60
penny
0.60
sugg
0.60
nor
0.59
fortun
0.58
Activations Density 0.035%