INDEX
Explanations
comparative phrases indicating quality or similarity
New Auto-Interp
Negative Logits
ho
-0.18
ha
-0.15
ant
-0.14
Editors
-0.14
©
-0.14
Leer
-0.14
615
-0.13
ÏĦÏĥ
-0.13
ander
-0.13
ingo
-0.13
POSITIVE LOGITS
than
0.19
-than
0.18
THAN
0.16
Than
0.15
Ú©Ùĩ
0.15
ulary
0.15
ThanOrEqualTo
0.14
than
0.14
ething
0.14
_than
0.14
Activations Density 0.010%