INDEX
Explanations
people's names related to music, entertainment, or sports
New Auto-Interp
Negative Logits
perse
-0.67
pires
-0.59
ISSION
-0.59
bourg
-0.58
xia
-0.58
avorite
-0.56
overriding
-0.56
naires
-0.56
jri
-0.56
standardized
-0.55
POSITIVE LOGITS
tered
1.07
avia
1.02
room
1.00
allion
0.96
rooms
0.93
girl
0.92
chet
0.91
ista
0.91
ting
0.89
kid
0.88
Activations Density 0.013%