INDEX
Explanations
instances of words related to impactful physical actions or events such as hitting, getting hit, injuries, accidents, medical care, and arrests in various contexts
prepositions, especially "by" and "out"
New Auto-Interp
Head Attr Weights
0:0.07
1:0.02
2:0.11
3:0.05
4:0.32
5:0.04
6:0.02
7:0.01
8:0.16
9:0.08
10:0.04
11:0.02
Negative Logits
behind
-1.58
iltr
-1.55
culated
-1.50
pired
-1.41
liness
-1.40
kered
-1.40
grand
-1.36
ever
-1.35
symb
-1.35
kept
-1.33
POSITIVE LOGITS
��
1.38
therap
1.35
Attach
1.33
blot
1.30
Dems
1.27
injury
1.24
Heller
1.23
liberals
1.20
heap
1.20
Shogun
1.19
Activations Density 0.022%