INDEX
Explanations
references to a specific individual, likely involved in notable events or discussions
New Auto-Interp
Negative Logits
avers
-0.17
verse
-0.16
ails
-0.16
ENTE
-0.15
achts
-0.15
tank
-0.14
ugin
-0.14
herits
-0.14
нÑĮ
-0.14
usk
-0.14
POSITIVE LOGITS
oping
0.20
sco
0.19
oop
0.18
oped
0.18
oters
0.18
Sco
0.17
oby
0.17
oter
0.16
eye
0.16
rido
0.15
Activations Density 0.007%