INDEX
Explanations
references to specific characters or cast members in a narrative
New Auto-Interp
Negative Logits
erer
-0.19
cke
-0.17
annes
-0.17
geh
-0.16
zeigen
-0.16
er
-0.16
udson
-0.16
jez
-0.16
suppress
-0.15
652
-0.15
POSITIVE LOGITS
igated
0.33
aways
0.33
away
0.32
igation
0.31
igate
0.28
ellan
0.28
iron
0.27
les
0.27
ings
0.26
ell
0.24
Activations Density 0.020%