INDEX
Explanations
pronouns referring to unspecified entities
references to entities or objects being discussed in the text
New Auto-Interp
Negative Logits
âĢ¢âĢ¢
-0.74
mire
-0.69
âĺħâĺħ
-0.68
Iowa
-0.68
odge
-0.67
amer
-0.67
Party
-0.66
SPA
-0.66
ILE
-0.64
ORDER
-0.63
POSITIVE LOGITS
atically
1.29
selves
1.29
selves
1.26
atic
1.14
self
0.96
conduc
0.83
atics
0.82
behav
0.73
MpServer
0.72
orally
0.72
Activations Density 0.155%