INDEX
Explanations
references to specific characters and events in the Twilight series
New Auto-Interp
Negative Logits
595
-0.15
rale
-0.15
itra
-0.15
otron
-0.15
GIN
-0.14
flaming
-0.14
Attribution
-0.14
Koch
-0.14
eldorf
-0.14
anh
-0.13
POSITIVE LOGITS
Twilight
0.32
Eclipse
0.25
Bella
0.24
vampire
0.22
Fork
0.22
vamp
0.22
.eclipse
0.22
okane
0.22
clipse
0.21
twilight
0.21
Activations Density 0.012%