INDEX
Explanations
references to specific characters or names in a narrative context
New Auto-Interp
Negative Logits
aldi
-0.15
ram
-0.15
mess
-0.14
Ruiz
-0.14
enez
-0.14
api
-0.14
usz
-0.14
ustral
-0.13
095
-0.13
Py
-0.13
POSITIVE LOGITS
assi
0.18
arring
0.18
eview
0.17
oney
0.16
alion
0.16
umpt
0.15
Ñģов
0.15
aks
0.15
elian
0.15
Alban
0.15
Activations Density 0.027%