INDEX
Explanations
punctuation, specifically the comma
New Auto-Interp
Head Attr Weights
0:0.08
1:0.08
2:0.10
3:0.08
4:0.07
5:0.06
6:0.07
7:0.09
8:0.07
9:0.09
10:0.09
11:0.07
Negative Logits
�
-2.66
_>
-2.39
?:
-2.35
Jews
-2.33
Ire
-2.30
Americans
-2.26
yip
-2.24
SPONSORED
-2.21
Episode
-2.19
%:
-2.17
POSITIVE LOGITS
meticulously
2.10
sacrament
2.10
offic
2.07
Brass
2.06
pharmacies
2.02
atories
2.00
ograph
1.97
Streets
1.97
SERV
1.97
rint
1.97
Activations Density 0.000%