INDEX
Explanations
references to separation or distance
New Auto-Interp
Negative Logits
frauen
-0.16
aggio
-0.16
lại
-0.16
iams
-0.16
odore
-0.16
esses
-0.15
ovah
-0.15
ILLA
-0.15
empo
-0.15
lez
-0.15
POSITIVE LOGITS
ward
0.34
wards
0.23
yyyy
0.22
yyy
0.20
/down
0.20
WARD
0.18
/on
0.18
/up
0.17
eward
0.17
ÌĢ
0.17
Activations Density 0.035%