INDEX
Explanations
phrases indicating persistence or inevitability in situations
New Auto-Interp
Negative Logits
WND
-0.17
_simps
-0.17
adla
-0.16
vox
-0.15
ãĤ¤ãĥ¤
-0.15
AIT
-0.14
ewan
-0.14
UTO
-0.14
rowse
-0.14
zos
-0.14
POSITIVE LOGITS
this
0.41
thus
0.38
è¿Ļæł·
0.36
this
0.35
asÃŃ
0.35
éĤ£æł·
0.32
thus
0.31
THAT
0.31
böyle
0.31
váºŃy
0.30
Activations Density 0.363%