INDEX
Explanations
questions or uncertainty about choices and conditions
New Auto-Interp
Negative Logits
ä¸įäºĨ
-0.19
him
-0.18
ним
-0.16
herself
-0.16
Them
-0.15
ä¸įåΰ
-0.15
NEVER
-0.15
ed
-0.15
eux
-0.15
asla
-0.14
POSITIVE LOGITS
/how
0.56
there
0.50
they
0.50
it
0.46
we
0.42
anyone
0.40
anybody
0.38
/if
0.37
anything
0.34
there
0.34
Activations Density 0.084%