INDEX
Explanations
mentions of male pronouns
references to a specific individual or pronouns related to a person
New Auto-Interp
Negative Logits
history
-0.66
Manip
-0.65
Dialogue
-0.65
Interest
-0.62
change
-0.60
Combine
-0.60
Beta
-0.60
Gems
-0.60
Jugg
-0.59
ylon
-0.58
POSITIVE LOGITS
Majesty
1.06
ctor
0.87
zbollah
0.85
panic
0.83
eded
0.81
uristic
0.79
ufact
0.77
bert
0.77
eding
0.75
aney
0.73
Activations Density 0.404%