INDEX
Explanations
names of famous individuals and specific locations
New Auto-Interp
Negative Logits
ly
-0.84
cial
-0.74
LY
-0.73
Gork
-0.72
âĹ¼
-0.70
rely
-0.69
cffffcc
-0.68
bered
-0.67
ienced
-0.65
cles
-0.65
POSITIVE LOGITS
iard
1.23
iday
1.06
igan
1.05
ows
1.03
igans
0.99
anova
0.96
iflower
0.95
wagen
0.94
aday
0.92
iston
0.91
Activations Density 6.265%