INDEX
Explanations
pronouns followed by verbs indicating action
pronouns and references to groups of individuals
New Auto-Interp
Negative Logits
ahime
-0.74
RTX
-0.65
ãĥ¼ãĥĨ
-0.63
Correct
-0.62
00000
-0.61
=-=-
-0.60
rium
-0.59
bridge
-0.58
Press
-0.58
politics
-0.58
POSITIVE LOGITS
were
1.00
are
0.98
perished
0.92
relate
0.84
belong
0.84
involve
0.82
consisted
0.81
originated
0.81
reside
0.81
have
0.77
Activations Density 0.072%