INDEX
Explanations
linguistic and structural elements in written text
New Auto-Interp
Negative Logits
lus
-0.18
lrt
-0.17
enso
-0.16
laz
-0.16
ventus
-0.15
lain
-0.15
Inspectable
-0.15
abox
-0.15
ế
-0.14
agini
-0.14
POSITIVE LOGITS
au
0.40
aux
0.40
al
0.32
Aux
0.29
alla
0.28
ao
0.28
aos
0.28
Au
0.28
aux
0.27
au
0.25
Activations Density 0.045%