INDEX
Explanations
mentions of historical events or changes over time
references to historical context or past events
New Auto-Interp
Negative Logits
wake
-0.74
Redditor
-0.71
Dialogue
-0.70
=>
-0.69
Notes
-0.68
apply
-0.67
akens
-0.67
alias
-0.65
interven
-0.64
ibliography
-0.64
POSITIVE LOGITS
existed
0.73
unthinkable
0.70
ruled
0.70
dinosaurs
0.66
dated
0.66
feared
0.66
linear
0.63
Linear
0.63
envisioned
0.62
innocence
0.61
Activations Density 1.273%