INDEX
Explanations
punctuation marks, specifically commas and periods
New Auto-Interp
Negative Logits
otto
-0.16
aises
-0.15
гÑĢи
-0.14
bff
-0.14
ieri
-0.14
aise
-0.14
amburger
-0.14
igner
-0.13
amat
-0.13
evi
-0.13
POSITIVE LOGITS
(~(
0.15
åģ
0.14
ANJI
0.14
dispatch
0.14
untu
0.14
extensions
0.13
ATTERN
0.13
ائÙĤ
0.13
ongyang
0.13
Dispatch
0.13
Activations Density 0.181%