INDEX
Explanations
references to taking control over one's own situation
New Auto-Interp
Negative Logits
owitz
-0.17
806
-0.17
ihn
-0.16
ekim
-0.16
utz
-0.15
Coord
-0.15
odash
-0.15
htable
-0.15
azu
-0.15
åŁŁ
-0.14
POSITIVE LOGITS
into
0.28
hands
0.27
nelle
0.26
in
0.26
Hands
0.25
Into
0.25
manos
0.24
èIJ½
0.21
Into
0.21
_into
0.21
Activations Density 0.069%