INDEX
Explanations
political and legal references involving blame, accountability, and public statements
New Auto-Interp
Head Attr Weights
0:0.06
1:0.02
2:0.27
3:0.16
4:0.03
5:0.05
6:0.03
7:0.06
8:0.04
9:0.04
10:0.16
11:0.03
Negative Logits
adversity
-2.40
situations
-2.35
situation
-2.29
kindness
-2.26
manners
-2.26
Volunte
-2.21
2100
-2.18
Younger
-2.16
friendships
-2.13
toughness
-2.11
POSITIVE LOGITS
debunked
2.68
authenticated
2.47
uthor
2.46
dubbed
2.43
furiously
2.40
vehemently
2.38
billed
2.32
successfully
2.29
discredited
2.29
���
2.27
Activations Density 0.075%