INDEX
Explanations
adjectives and comparative phrases associated with quality or success
New Auto-Interp
Negative Logits
erah
-0.16
enough
-0.16
доÑģÑĤаÑĤоÑĩно
-0.15
akan
-0.14
Enough
-0.14
å¤Ł
-0.14
доÑģиÑĤÑĮ
-0.14
доÑģÑĤаÑĤ
-0.14
acos
-0.14
olest
-0.14
POSITIVE LOGITS
than
1.47
than
1.22
Than
1.11
THAN
1.11
Than
1.08
-than
1.06
_than
1.01
niż
0.79
än
0.77
než
0.76
Activations Density 0.750%