INDEX
Explanations
references to political figures and their actions
New Auto-Interp
Negative Logits
Alic
-0.72
Loki
-0.68
Aph
-0.66
Spectre
-0.66
Prism
-0.65
Reincarnated
-0.65
Philips
-0.64
âĤ¬
-0.64
Lego
-0.63
rists
-0.63
POSITIVE LOGITS
WASHINGTON
0.82
³³³³
0.79
CLOSE
0.79
³³³
0.79
Correct
0.78
Washington
0.76
pmwiki
0.76
LOS
0.75
Correction
0.73
SHARE
0.72
Activations Density 0.078%