INDEX
Explanations
proper nouns, particularly names of people
New Auto-Interp
Negative Logits
pid
-0.17
eec
-0.15
lli
-0.15
vement
-0.15
orrow
-0.15
ãĥªãĥ¼ãĤº
-0.14
="__
-0.14
allon
-0.14
utility
-0.14
tery
-0.14
POSITIVE LOGITS
mann
0.41
berg
0.33
inger
0.30
acher
0.29
berger
0.29
hammer
0.29
heimer
0.28
auer
0.28
feld
0.28
me
0.27
Activations Density 0.213%