INDEX
Explanations
punctuation marks, particularly semicolons and periods
New Auto-Interp
Negative Logits
fran
-0.76
widetilde
-0.72
alan
-0.67
ondy
-0.66
ers
-0.64
of
-0.64
nungs
-0.63
P
-0.62
tps
-0.62
جه
-0.62
POSITIVE LOGITS
$;
1.44
;;;
1.31
;;;;
1.27
AndEndTag
1.22
_;
1.22
icolon
1.22
+;
1.19
}$;
1.18
__;
1.15
,:);
1.14
Activations Density 0.213%