INDEX
Explanations
phrases related to physical acts of aggression or impact
references to the word "punch" in various contexts
New Auto-Interp
Negative Logits
uve
-0.92
abeth
-0.76
aird
-0.68
icter
-0.65
Private
-0.64
Citizen
-0.64
Neural
-0.64
udic
-0.62
rians
-0.61
Archdemon
-0.61
POSITIVE LOGITS
bowl
1.25
punches
0.87
punch
0.84
aneers
0.80
bag
0.76
istani
0.75
outs
0.75
punching
0.74
sticks
0.74
cart
0.74
Activations Density 0.014%