INDEX
Explanations
text related to political news and events
New Auto-Interp
Negative Logits
xtap
-0.55
arij
-0.52
iple
-0.49
ortium
-0.47
anecd
-0.47
summed
-0.46
refers
-0.46
puzzled
-0.45
consists
-0.44
oret
-0.44
POSITIVE LOGITS
'.
0.77
)).
0.76
".
0.73
.</
0.71
.[
0.65
$.
0.65
.''.
0.64
.''
0.64
?".
0.63
''.
0.62
Activations Density 17.962%