INDEX
Explanations
comparative adjectives that describe improvements or advantages
New Auto-Interp
Negative Logits
ediÄŁi
-0.16
ismet
-0.16
atori
-0.15
ÙĨÚ¯ÛĮ
-0.15
edis
-0.15
XCT
-0.14
acier
-0.14
edef
-0.14
ATAB
-0.14
esel
-0.14
POSITIVE LOGITS
than
0.66
than
0.49
THAN
0.43
then
0.43
Than
0.41
Than
0.39
_than
0.39
tha
0.37
tan
0.36
än
0.35
Activations Density 0.124%