INDEX
Explanations
phrases that indicate actions or accusations related to individuals or groups
New Auto-Interp
Head Attr Weights
0:0.02
1:0.03
2:0.07
3:0.07
4:0.02
5:0.03
6:0.05
7:0.05
8:0.06
9:0.27
10:0.12
11:0.17
Negative Logits
Comments
-1.30
Pokémon
-1.26
Size
-1.25
greets
-1.21
Discussion
-1.21
FILE
-1.16
unfolds
-1.15
ISE
-1.14
Ball
-1.14
oku
-1.14
POSITIVE LOGITS
wounding
1.48
distortion
1.35
cannibal
1.34
grave
1.30
sabot
1.30
killing
1.30
gou
1.26
VAT
1.25
distortions
1.18
inflicting
1.17
Activations Density 0.011%