INDEX
Explanations
phrases indicating comparisons and expressions of identity
New Auto-Interp
Negative Logits
obe
-0.18
ahren
-0.17
itos
-0.15
bian
-0.15
EOF
-0.15
icit
-0.14
ä»ķ
-0.14
éİ®
-0.14
.construct
-0.14
Parts
-0.14
POSITIVE LOGITS
bergen
0.16
alous
0.15
Mot
0.14
SZ
0.14
gf
0.14
ż
0.14
Ap
0.13
breathed
0.13
vore
0.13
Ap
0.13
Activations Density 0.436%