INDEX
Explanations
sources or places where information is conveyed, such as news outlets or interviews
mentions of news organizations and their publications
New Auto-Interp
Negative Logits
veter
-0.64
animate
-0.59
perfected
-0.58
diaper
-0.58
causal
-0.57
atible
-0.56
harms
-0.55
perman
-0.54
underestimated
-0.52
parity
-0.51
POSITIVE LOGITS
.
0.82
.</
0.72
quoted
0.71
quoting
0.69
.).
0.68
referring
0.67
lied
0.66
."
0.64
).
0.64
].
0.63
Activations Density 0.216%