INDEX
Explanations
conditional phrases indicating purpose or reasoning
New Auto-Interp
Negative Logits
poÄįet
-0.15
ivol
-0.15
дем
-0.14
ienne
-0.14
inan
-0.14
rug
-0.14
rats
-0.14
åĻ
-0.14
arget
-0.14
æĢİ
-0.14
POSITIVE LOGITS
ìį¨
0.24
-called
0.21
forth
0.20
that
0.18
inel
0.17
zial
0.17
future
0.16
oth
0.16
proper
0.15
elman
0.15
Activations Density 0.048%