INDEX
Explanations
phrases that express falsehood or deception
New Auto-Interp
Head Attr Weights
0:0.08
1:0.08
2:0.08
3:0.07
4:0.07
5:0.08
6:0.08
7:0.08
8:0.08
9:0.08
10:0.08
11:0.08
Negative Logits
estate
-3.58
essen
-3.54
Hispanic
-2.94
Southern
-2.93
gas
-2.89
aum
-2.88
atown
-2.86
roe
-2.77
overty
-2.72
tin
-2.70
POSITIVE LOGITS
Typhoon
2.81
Typh
2.60
Shogun
2.57
Bus
2.47
Jinping
2.45
doubtless
2.41
queues
2.39
Constable
2.38
Farage
2.36
Huawei
2.36
Activations Density 0.000%