INDEX
Explanations
boundary setting and violations
New Auto-Interp
Negative Logits
اين
1.09
ري
1.06
alcoved
0.98
ва
0.95
ра
0.94
ба
0.94
в
0.90
j
0.89
CUSSION
0.89
ш
0.89
POSITIVE LOGITS
i
1.25
er
1.20
iin
1.13
el
1.11
et
1.11
as
1.05
ir
1.04
for
1.01
ก
1.01
am
0.98
Activations Density 0.022%