INDEX
Explanations
phrases that emphasize totality or completeness
New Auto-Interp
Negative Logits
æľ¬
-0.17
roz
-0.16
дина
-0.16
ãĥĥãĥģ
-0.15
бÑĥдÑĮ
-0.15
нина
-0.15
iper
-0.14
isle
-0.14
edom
-0.14
lug
-0.14
POSITIVE LOGITS
of
0.26
manner
0.20
those
0.17
those
0.15
ÏĦÏīν
0.15
awi
0.15
vier
0.15
taj
0.14
involved
0.14
/all
0.14
Activations Density 0.061%