INDEX
Explanations
pronouns and possessive pronouns indicating a person
references to personal pronouns and their corresponding contexts
New Auto-Interp
Negative Logits
etheless
-0.70
amily
-0.59
IVERS
-0.55
andel
-0.54
unden
-0.52
itely
-0.50
Prompt
-0.50
cancellation
-0.49
alike
-0.48
both
-0.48
POSITIVE LOGITS
/
1.43
panic
1.25
/
1.10
/.
1.05
/#
1.05
or
1.05
/,
1.04
/$
1.01
/_
0.96
/(
0.95
Activations Density 0.258%