INDEX
Explanations
pronouns indicating a group or individual performing an action
references to groups of people or collective entities
New Auto-Interp
Negative Logits
atlantic
-0.70
ãĥ©ãĥ³
-0.69
lihood
-0.68
atory
-0.67
iens
-0.66
Rough
-0.65
Jindal
-0.64
Eleven
-0.64
DAY
-0.63
20439
-0.61
POSITIVE LOGITS
've
1.19
'll
1.18
're
1.14
'd
1.05
dunno
0.86
wand
0.85
cannot
0.84
listened
0.83
drank
0.83
wandered
0.82
Activations Density 0.633%