INDEX
Explanations
phrases indicating a change in situation or unexpected outcomes
New Auto-Interp
Negative Logits
idan
-0.19
ime
-0.17
cki
-0.15
èij
-0.15
agem
-0.15
unprotected
-0.15
imes
-0.15
ist
-0.14
çĿ
-0.14
uset
-0.14
POSITIVE LOGITS
into
0.17
caff
0.16
onCancelled
0.15
nout
0.15
out
0.15
شتر
0.15
LayoutPanel
0.14
owie
0.14
tail
0.14
ůst
0.14
Activations Density 0.015%