INDEX
Explanations
references to a specific individual, likely in a literary or artistic context
New Auto-Interp
Negative Logits
ctal
-0.19
ulling
-0.18
ši
-0.15
ulis
-0.15
erview
-0.14
ulse
-0.14
oog
-0.14
chatte
-0.14
iction
-0.14
rens
-0.14
POSITIVE LOGITS
abeth
0.50
abet
0.29
beth
0.26
ondo
0.19
izabeth
0.17
eph
0.17
nore
0.16
âng
0.16
aph
0.16
ivet
0.16
Activations Density 0.009%