INDEX
Explanations
actions and roles being exchanged or substituted among characters
New Auto-Interp
Negative Logits
ungal
-0.15
Jon
-0.14
uzey
-0.14
woff
-0.14
nder
-0.14
ystore
-0.13
lc
-0.13
URE
-0.13
acquaint
-0.13
lection
-0.13
POSITIVE LOGITS
replace
0.31
replacement
0.28
replace
0.27
replaced
0.26
replacing
0.26
replacement
0.26
replacements
0.26
代
0.25
.replace
0.25
replaces
0.24
Activations Density 0.087%