INDEX
Explanations
mentions of boys or male characters in various contexts
New Auto-Interp
Negative Logits
InputDecoration
-0.89
″]
-0.80
>")
-0.79
IsContent
-0.79
-0.78
AndEndTag
-0.78
PMC
-0.74
للمعارف
-0.74
"]}
-0.74
')}
-0.74
POSITIVE LOGITS
boys
1.88
Boys
1.83
boy
1.83
Boy
1.83
Boy
1.82
BOY
1.78
BOYS
1.76
boy
1.75
Boys
1.67
boys
1.64
Activations Density 0.032%