INDEX
Explanations
controversial topics or statements
references to controversial topics or issues
New Auto-Interp
Negative Logits
vation
-0.89
elsen
-0.87
nings
-0.78
á
-0.78
strings
-0.76
hower
-0.75
abetic
-0.73
minster
-0.73
ruary
-0.72
abetes
-0.72
POSITIVE LOGITS
aspects
0.92
topic
0.91
topics
0.90
ity
0.87
proposition
0.83
opinions
0.83
viewpoints
0.82
views
0.81
aspect
0.81
fringe
0.80
Activations Density 0.078%