INDEX
Explanations
titles or formats used to describe literary works, movies, or plays
references to novels and films, particularly adaptations and genres
New Auto-Interp
Negative Logits
okin
-0.82
orthy
-0.82
cffff
-0.80
xus
-0.77
aples
-0.76
achev
-0.76
inks
-0.73
usra
-0.72
rians
-0.72
nyder
-0.72
POSITIVE LOGITS
Artemis
1.05
Hannah
0.88
Barney
0.86
Notting
0.84
Samuel
0.82
Annie
0.81
Rudolph
0.79
Martha
0.78
Esther
0.78
Oscar
0.77
Activations Density 0.204%