INDEX
Explanations
expressions related to beliefs, thoughts, and assumptions
New Auto-Interp
Negative Logits
Acerca
-0.61
him
-0.60
Darauf
-0.59
honom
-0.59
évaluateur
-0.58
对我
-0.55
otomatig
-0.53
对他
-0.53
didst
-0.52
BuilderFactory
-0.52
POSITIVE LOGITS
they
2.24
we
1.77
there
1.76
it
1.32
he
1.27
she
1.16
they
1.16
you
1.10
there
1.03
они
1.02
Activations Density 0.626%