INDEX
Explanations
controversial topics or subjects
references to controversial subjects
New Auto-Interp
Negative Logits
abiding
-0.85
vation
-0.83
abetic
-0.77
thia
-0.75
nings
-0.75
ILA
-0.74
á
-0.73
abet
-0.72
ruary
-0.71
united
-0.71
POSITIVE LOGITS
topic
0.99
topics
0.95
aspects
0.93
proposition
0.91
aspect
0.90
propositions
0.89
decisions
0.88
opinions
0.86
remarks
0.84
controversial
0.84
Activations Density 0.073%