INDEX
Explanations
phrases indicating preparation or readiness to act
New Auto-Interp
Negative Logits
↵↵
-0.19
handjob
-0.16
stay
-0.16
styleType
-0.16
remained
-0.15
Remain
-0.14
pra
-0.14
/Typography
-0.14
ité
-0.14
äter
-0.14
POSITIVE LOGITS
tackle
0.21
rock
0.21
tackling
0.21
action
0.20
accept
0.20
rum
0.20
tackled
0.19
Rock
0.18
hit
0.18
go
0.18
Activations Density 0.072%