INDEX
Explanations
phrases related to physical impacts or strong attacks
occurrences of the word "hit" and its variations, typically in contexts related to physical impact or injury
New Auto-Interp
Negative Logits
pires
-0.82
æ©Ł
-0.73
UTH
-0.72
utical
-0.68
theless
-0.67
æĥ
-0.64
cia
-0.64
¥µ
-0.63
Philos
-0.62
colours
-0.62
POSITIVE LOGITS
ched
1.20
chens
1.13
ches
0.98
box
0.96
achi
0.93
ters
0.86
gerald
0.86
boxes
0.84
ted
0.84
rod
0.81
Activations Density 0.028%