INDEX
Explanations
expressions indicating comparison or addition of information
New Auto-Interp
Negative Logits
ismet
-0.18
anners
-0.15
557
-0.15
iglia
-0.14
kur
-0.14
sen
-0.14
trú
-0.14
anny
-0.14
Encounter
-0.13
irting
-0.13
POSITIVE LOGITS
than
0.34
-than
0.27
_than
0.24
niż
0.24
THAN
0.24
than
0.22
än
0.22
Than
0.20
Than
0.20
á»Ļt
0.20
Activations Density 0.052%