INDEX
Explanations
proper names, specifically focusing on the name Ivan
New Auto-Interp
Negative Logits
Subject
-0.73
ORE
-0.72
aver
-0.70
reads
-0.70
Scient
-0.68
Score
-0.66
Methods
-0.66
aker
-0.63
shapeshifter
-0.63
¥µ
-0.63
POSITIVE LOGITS
Ivan
1.22
ovich
1.07
ovic
0.96
hoe
0.92
Ily
0.81
Nikol
0.80
imov
0.77
tek
0.77
Neville
0.76
Nikola
0.74
Activations Density 0.006%