INDEX
Explanations
phrases indicating contradictions or conflicts within ideas or beliefs
New Auto-Interp
Negative Logits
zos
-0.16
erd
-0.16
опол
-0.15
abet
-0.15
arih
-0.15
ritte
-0.14
wap
-0.14
andin
-0.13
unic
-0.13
etc
-0.13
POSITIVE LOGITS
/or
0.26
/OR
0.22
/from
0.21
ients
0.17
those
0.16
its
0.15
actual
0.14
ãģĿãĤĮ
0.14
what
0.13
counterpart
0.13
Activations Density 0.171%