INDEX
Explanations
states, conditions, and concepts
New Auto-Interp
Negative Logits
a
1.12
at
1.02
and
0.96
อบ
0.84
f
0.82
j
0.80
dro
0.79
do
0.78
de
0.77
all
0.73
POSITIVE LOGITS
ز
1.04
н
1.01
т
0.96
:
0.91
'
0.88
д
0.85
з
0.84
불구하고
0.76
г
0.75
х
0.75
Activations Density 0.001%