INDEX
Explanations
references to plot twists and shocking elements in narratives
New Auto-Interp
Negative Logits
ho
-0.71
ighth
-0.70
ittee
-0.69
elson
-0.69
largeDownload
-0.68
alty
-0.68
fort
-0.67
idated
-0.67
pora
-0.66
easing
-0.66
POSITIVE LOGITS
happened
0.96
happens
0.94
Happ
0.91
ILE
0.89
?!
0.84
ensued
0.81
TF
0.80
!?
0.77
happ
0.76
TY
0.74
Activations Density 0.004%