INDEX
Explanations
statements that indicate existence or presence
New Auto-Interp
Negative Logits
ml
-0.17
li
-0.16
loo
-0.16
uesto
-0.15
ynch
-0.15
okino
-0.14
rest
-0.14
Vital
-0.14
vection
-0.14
ASI
-0.14
POSITIVE LOGITS
amo
0.25
trov
0.20
può
0.19
tro
0.18
è
0.18
era
0.18
inn
0.18
oux
0.17
diffuse
0.17
tratt
0.17
Activations Density 0.002%