INDEX
Explanations
semicolons and punctuation marks in the text
New Auto-Interp
Negative Logits
fran
-0.76
of
-0.68
alan
-0.63
vol
-0.60
up
-0.59
dead
-0.59
widetilde
-0.59
de
-0.57
glo
-0.57
tps
-0.57
POSITIVE LOGITS
$;
1.79
;
1.66
}$;
1.64
+;
1.57
.;
1.56
;;;
1.53
%;
1.52
;;
1.50
_;
1.50
;;;;
1.49
Activations Density 0.221%