INDEX
Explanations
references to historical figures and their contributions
New Auto-Interp
Negative Logits
šak
-0.17
antal
-0.16
cest
-0.15
liers
-0.15
vox
-0.14
lož
-0.14
bri
-0.14
aan
-0.14
luk
-0.13
uite
-0.13
POSITIVE LOGITS
Hel
0.26
Fel
0.23
Hel
0.22
Fel
0.20
Gel
0.19
fel
0.19
HEL
0.19
à¥ĩल
0.18
Felix
0.18
fel
0.18
Activations Density 0.068%