INDEX
Explanations
texts related to political figures and events
New Auto-Interp
Negative Logits
iple
-0.76
izen
-0.72
ire
-0.70
aimon
-0.70
oll
-0.67
chin
-0.65
isi
-0.65
mite
-0.65
ocl
-0.64
iang
-0.64
POSITIVE LOGITS
noting
1.22
citing
1.19
prompting
1.19
albeit
1.14
preferring
1.12
claiming
1.11
including
1.11
alleging
1.09
namely
1.09
hoping
1.07
Activations Density 1.507%