INDEX
Explanations
mentions of stories or narratives
references to various narratives or stories
New Auto-Interp
Negative Logits
ividual
-0.84
erate
-0.82
oved
-0.82
foreseen
-0.81
ierrez
-0.80
uters
-0.76
rador
-0.76
activated
-0.76
raviolet
-0.75
essors
-0.75
POSITIVE LOGITS
tale
1.45
tales
1.40
Tales
1.04
Tale
1.02
tale
0.95
saga
0.83
Ragnarok
0.83
Leviathan
0.82
tell
0.78
Mania
0.77
Activations Density 0.015%