INDEX
Explanations
mentions of a particular name
occurrences of specific letters in the text
New Auto-Interp
Negative Logits
innocence
-0.71
Tsukuyomi
-0.64
disgust
-0.63
tresp
-0.63
auga
-0.62
spilled
-0.62
chwitz
-0.61
Tend
-0.60
hover
-0.60
$$$$
-0.59
POSITIVE LOGITS
mony
0.77
andowski
0.75
atives
0.75
ths
0.75
idation
0.74
scription
0.72
thal
0.71
Nic
0.71
anium
0.68
quer
0.68
Activations Density 0.092%