INDEX
Explanations
phrases and language indicating conflict or competition
New Auto-Interp
Head Attr Weights
0:0.04
1:0.04
2:0.05
3:0.08
4:0.06
5:0.05
6:0.05
7:0.43
8:0.03
9:0.03
10:0.05
11:0.04
Negative Logits
Colleges
-2.61
eneg
-2.56
Obesity
-2.25
Glacier
-2.21
gow
-2.19
Jaguar
-2.15
culosis
-2.13
Torn
-2.11
vernment
-2.11
gor
-2.10
POSITIVE LOGITS
00200000
3.23
former
2.66
erville
2.54
之
2.52
dayName
2.40
charm
2.39
answ
2.36
Filename
2.34
uitous
2.29
プ
2.26
Activations Density 0.038%