INDEX
Explanations
pronouns referring to specific individuals
pronouns referring to individuals
New Auto-Interp
Negative Logits
assing
-0.79
atana
-0.74
opic
-0.73
adobe
-0.70
mble
-0.69
icycle
-0.68
assian
-0.67
igmatic
-0.66
ICO
-0.66
opia
-0.66
POSITIVE LOGITS
Majesty
0.86
majesty
0.83
rul
0.82
mos
0.68
persisted
0.67
zbollah
0.66
spoiled
0.65
markings
0.65
cared
0.64
'll
0.64
Activations Density 0.416%