INDEX
Explanations
references to a character named Sarah
New Auto-Interp
Negative Logits
tti
-0.17
kor
-0.15
ega
-0.15
aal
-0.14
Roose
-0.14
che
-0.14
luk
-0.14
irty
-0.14
rav
-0.14
ardo
-0.14
POSITIVE LOGITS
Jane
0.19
Palin
0.19
Jane
0.18
acen
0.17
梨
0.17
ertz
0.17
plain
0.17
cuda
0.16
ãĥªãĤ«
0.16
Beth
0.16
Activations Density 0.010%