INDEX
Explanations
names of specific entities or locations
specific names or terms related to food, substances, or popular culture
New Auto-Interp
Negative Logits
etheless
-0.80
compr
-0.70
guid
-0.69
coron
-0.69
irted
-0.66
ournal
-0.64
ebin
-0.64
magnification
-0.64
viously
-0.64
ammed
-0.63
POSITIVE LOGITS
Bowl
0.99
Games
0.98
Shop
0.97
Strip
0.91
Club
0.91
Boys
0.90
Mile
0.89
haus
0.88
finger
0.88
Balls
0.88
Activations Density 0.203%