INDEX
Explanations
references to the name "Carol" and its variants
New Auto-Interp
Negative Logits
ess
-0.18
ettle
-0.16
geben
-0.16
ovice
-0.15
etary
-0.15
ettes
-0.15
etti
-0.14
evi
-0.14
escal
-0.14
plier
-0.14
POSITIVE LOGITS
ynn
0.27
yn
0.25
Ann
0.21
inas
0.19
ien
0.19
thers
0.19
inea
0.19
YN
0.18
Ann
0.17
otherwise
0.16
Activations Density 0.005%