INDEX
Explanations
proper nouns, particularly names of individuals
New Auto-Interp
Negative Logits
Wunused
-0.15
-0.15
pto
-0.14
urrent
-0.14
OMIC
-0.14
Kerry
-0.14
Trouble
-0.14
hoops
-0.13
pard
-0.13
entina
-0.13
POSITIVE LOGITS
Phil
0.24
phia
0.21
Phil
0.19
phil
0.18
Philip
0.18
Phill
0.16
_ph
0.15
ãĥĢãĥ¼
0.15
bourg
0.15
Ph
0.15
Activations Density 0.024%