INDEX
Explanations
verbs related to physical actions involving throwing or moving oneself forcefully
phrases involving self-immersion or self-sacrifice
New Auto-Interp
Negative Logits
ÃŁ
-0.78
Marginal
-0.72
SET
-0.69
Developer
-0.67
Administ
-0.65
APH
-0.64
ND
-0.64
QUI
-0.63
QU
-0.63
achine
-0.63
POSITIVE LOGITS
overboard
1.12
tant
0.85
grenades
0.84
towel
0.76
grenade
0.74
torch
0.66
punches
0.66
insults
0.64
insult
0.64
lor
0.64
Activations Density 0.104%