INDEX
Explanations
specific words related to physical encounters or conflicts
language related to confrontations or encounters between individuals or groups
New Auto-Interp
Negative Logits
bern
-0.78
duc
-0.70
Balt
-0.67
za
-0.67
umb
-0.66
leaf
-0.65
hood
-0.63
wheel
-0.63
aku
-0.62
rib
-0.61
POSITIVE LOGITS
between
1.12
with
0.88
WITH
0.87
halla
0.85
BET
0.85
Between
0.84
between
0.82
iques
0.80
involving
0.76
roy
0.75
Activations Density 0.104%