INDEX
Explanations
proper names, specifically first names like "Ivan" and "Igor."
names of individuals, particularly those starting with "Ivan" or "Igor."
New Auto-Interp
Negative Logits
Score
-0.74
aver
-0.69
reads
-0.68
inals
-0.67
Scient
-0.65
Gameplay
-0.65
ĻĤ
-0.64
Americans
-0.63
woman
-0.63
Methods
-0.62
POSITIVE LOGITS
ovich
1.15
Ivan
1.06
hoe
1.01
ovic
0.93
opol
0.92
ova
0.77
imov
0.76
stadt
0.76
Ily
0.75
Äį
0.75
Activations Density 0.012%