INDEX
Explanations
structures related to code syntax and punctuation
New Auto-Interp
Negative Logits
of
-0.77
fran
-0.73
widetilde
-0.69
top
-0.63
un
-0.63
P
-0.62
de
-0.62
vol
-0.61
bos
-0.61
zun
-0.60
POSITIVE LOGITS
;");
1.62
$;
1.61
;;;
1.60
;;;;
1.50
,:);
1.50
?;
1.49
]';
1.49
()]);
1.47
}$;
1.46
;";
1.46
Activations Density 0.398%