INDEX
Explanations
mentions of books, authors, and historical events or figures in the political context
New Auto-Interp
Negative Logits
avorite
-0.81
sylv
-0.72
medium
-0.71
\-
-0.71
depending
-0.71
Deal
-0.70
mitter
-0.69
nant
-0.69
oresc
-0.68
umerous
-0.68
POSITIVE LOGITS
Other
0.98
Its
0.90
Others
0.89
Beyond
0.88
Friends
0.87
Politics
0.87
Dying
0.87
Mysterious
0.84
Transformation
0.83
Problem
0.83
Activations Density 0.183%