INDEX
Explanations
strong, critical adjectives or phrases
emotionally charged adjectives that convey strong criticism or commentary
New Auto-Interp
Negative Logits
hops
-0.95
qqa
-0.81
gat
-0.78
slave
-0.77
cells
-0.76
ween
-0.76
chwitz
-0.73
plane
-0.73
Jump
-0.71
Stores
-0.70
POSITIVE LOGITS
commentary
1.29
rebuke
1.23
indictment
1.23
critique
1.20
rebutt
1.19
understatement
1.16
remarks
1.14
statement
1.14
conclusion
1.11
explanation
1.08
Activations Density 0.233%