INDEX
Explanations
references to categories, types, or classifications of items and evidence
New Auto-Interp
Negative Logits
'&#
-0.15
æ§
-0.15
ades
-0.15
lil
-0.14
awa
-0.13
amel
-0.13
aviest
-0.13
omi
-0.13
ami
-0.13
ž
-0.13
POSITIVE LOGITS
besides
0.26
than
0.21
-than
0.20
than
0.20
world
0.19
niż
0.18
bes
0.17
equally
0.17
_than
0.17
Bes
0.16
Activations Density 0.238%