INDEX
Explanations
punctuation marks, particularly periods
New Auto-Interp
Negative Logits
awtextra
-0.76
extAlignment
-0.73
khid
-0.71
Stit
-0.69
gjenge
-0.67
jectures
-0.67
agré
-0.66
Hig
-0.65
مض
-0.63
ぐれ
-0.63
POSITIVE LOGITS
])).
1.16
()].
1.15
$.}
1.12
']").
1.08
]").
1.05
__).
1.04
}}$.
1.02
\.
0.98
"]').
0.98
).}
0.97
Activations Density 0.458%