INDEX
Explanations
mentions of historical events or contexts
references to historical events or concepts
New Auto-Interp
Negative Logits
vae
-0.73
rative
-0.73
ients
-0.72
rowd
-0.71
inki
-0.70
ched
-0.70
wer
-0.70
ividual
-0.70
utters
-0.69
ractive
-0.69
POSITIVE LOGITS
history
1.20
History
1.01
ISTORY
0.99
history
0.93
histories
0.92
Timeline
0.85
Medals
0.79
buffs
0.79
record
0.76
corpus
0.76
Activations Density 0.026%