INDEX
Explanations
phrases related to physical violence and bodily harm
violent actions and injuries
New Auto-Interp
Negative Logits
ivals
-0.83
Maps
-0.78
SPA
-0.74
Money
-0.71
roleum
-0.70
Beasts
-0.70
Introduced
-0.70
Ranked
-0.69
Bus
-0.68
iseum
-0.68
POSITIVE LOGITS
protr
1.03
stretched
0.98
exposing
0.92
horizontally
0.88
backwards
0.87
revealing
0.87
sideways
0.87
tightly
0.86
perpend
0.85
neatly
0.83
Activations Density 0.207%