INDEX
Explanations
mentions of physical violence or aggression, particularly related to the concept of a "dog-eat-dog" competition
New Auto-Interp
Negative Logits
éĹĺ
-0.87
DERR
-0.85
Edison
-0.81
esson
-0.78
artz
-0.78
oulos
-0.74
velength
-0.70
farious
-0.68
WARN
-0.67
ORN
-0.67
POSITIVE LOGITS
gie
1.11
patch
1.10
barking
1.06
fighting
1.03
fight
1.01
meat
1.00
matic
0.97
fights
0.95
matically
0.94
catcher
0.94
Activations Density 0.032%