INDEX
Explanations
phrases related to a particular person's name
words related to specific character names or titles
New Auto-Interp
Negative Logits
unacceptable
-0.69
unaff
-0.68
clubhouse
-0.66
friendship
-0.64
quicker
-0.63
signing
-0.63
unf
-0.63
liking
-0.62
mates
-0.61
improvement
-0.61
POSITIVE LOGITS
chin
2.79
iren
1.59
GI
1.57
zhen
1.38
olin
1.37
rin
1.31
uchin
1.24
Chin
1.14
irin
1.10
veil
1.08
Activations Density 0.019%