INDEX
Explanations
mentions of a specific male subject or character in various contexts
New Auto-Interp
Negative Logits
CloseOperation
-0.53
Personensuche
-0.51
,
-0.51
EDEFAULT
-0.49
layui
-0.48
stalo
-0.46
NewGuid
-0.46
—
-0.45
seamnă
-0.45
jsou
-0.44
POSITIVE LOGITS
himself
1.12
hehe
0.99
eding
0.94
himself
0.89
hehehe
0.88
too
0.83
arken
0.82
eded
0.81
fting
0.76
Himself
0.72
Activations Density 0.193%