INDEX
Explanations
references to specific characters and plot elements in a narrative context
New Auto-Interp
Negative Logits
anna
-0.17
emmel
-0.15
culos
-0.15
sted
-0.14
hecho
-0.14
cient
-0.14
do
-0.14
weg
-0.13
sacrifice
-0.13
ea
-0.13
POSITIVE LOGITS
mesma
0.19
que
0.19
sua
0.18
qui
0.18
inda
0.18
ép
0.18
ugins
0.17
ÃŃ
0.17
próp
0.16
eron
0.16
Activations Density 0.009%