INDEX
Explanations
phrases indicating quantity or comparative relationships
New Auto-Interp
Negative Logits
ouz
-0.16
tolik
-0.15
hurst
-0.15
iza
-0.15
izu
-0.15
ruž
-0.14
ŀæĢ§
-0.14
isia
-0.14
ÑĨÑĥ
-0.14
astle
-0.14
POSITIVE LOGITS
as
0.29
early
0.19
EAR
0.17
early
0.16
как
0.16
dès
0.15
encil
0.14
Vine
0.14
æĹ©
0.14
EAR
0.14
Activations Density 0.021%