INDEX
Explanations
instances of punctuation or dashes that indicate pauses or breaks in text
New Auto-Interp
Negative Logits
åķ
-0.17
fé
-0.16
ellido
-0.14
acman
-0.14
xec
-0.14
ingga
-0.14
auc
-0.14
eed
-0.14
oth
-0.13
ilder
-0.13
POSITIVE LOGITS
sdale
0.15
oret
0.15
lich
0.15
equals
0.15
both
0.14
rare
0.14
NX
0.14
uni
0.13
both
0.13
ãĥ¼ãĥ
0.13
Activations Density 0.127%