INDEX
Explanations
common nouns referring to different individuals
references to people involved in a narrative or situation
New Auto-Interp
Negative Logits
require
-0.68
arning
-0.67
ilver
-0.64
respect
-0.62
olars
-0.62
Length
-0.62
Members
-0.61
ourcing
-0.60
ornia
-0.60
ystem
-0.59
POSITIVE LOGITS
himself
0.87
liest
0.77
stown
0.73
iest
0.72
's
0.71
claimant
0.69
regretted
0.67
herself
0.67
promptly
0.66
complainant
0.66
Activations Density 0.285%