INDEX
Explanations
mentions of young boys or boys in various contexts
mentions of boys, particularly in distressing or dangerous situations
New Auto-Interp
Negative Logits
aeda
-0.77
Associates
-0.74
Locations
-0.67
Places
-0.67
workplaces
-0.67
ModLoader
-0.66
ENCE
-0.64
indal
-0.64
location
-0.63
Decoder
-0.63
POSITIVE LOGITS
ishly
1.01
hood
0.89
children
0.87
leukemia
0.86
boy
0.86
friend
0.83
girl
0.80
whom
0.78
ish
0.77
crow
0.77
Activations Density 0.074%