INDEX
Explanations
proper names of individuals
proper nouns, particularly names of people and entities
New Auto-Interp
Negative Logits
vana
-0.89
words
-0.85
earchers
-0.79
mates
-0.78
agra
-0.76
devices
-0.76
drivers
-0.75
ells
-0.75
aways
-0.74
nexus
-0.74
POSITIVE LOGITS
Sch
1.08
Berg
1.04
Gonz
1.03
Ernst
1.03
Klopp
1.02
Jensen
1.00
Klein
1.00
Schneider
0.98
Luk
0.98
Karl
0.98
Activations Density 0.190%