INDEX
Explanations
referenced pronouns, particularly the first-person singular "I"
New Auto-Interp
Negative Logits
um
-0.58
ide
-0.58
an
-0.57
ha
-0.56
ah
-0.55
el
-0.55
en
-0.55
s
-0.54
et
-0.54
ar
-0.54
POSITIVE LOGITS
i
1.19
parsedMessage
0.79
ii
0.69
kaarangay
0.67
aarrggbb
0.63
rungsseite
0.61
iii
0.60
SwitchCompat
0.60
oredCriteria
0.60
ویکیپدی
0.59
Activations Density 0.541%