INDEX
Explanations
proper nouns or names
mentions of specific individuals or names
New Auto-Interp
Negative Logits
town
-0.66
ongyang
-0.64
Ģ
-0.63
arms
-0.60
pride
-0.60
HAEL
-0.59
Kinnikuman
-0.59
ridges
-0.58
cheeks
-0.57
orie
-0.57
POSITIVE LOGITS
ited
0.93
vironment
0.93
cend
0.91
thal
0.90
ction
0.89
swer
0.83
issance
0.83
ity
0.82
emies
0.82
cing
0.81
Activations Density 0.025%