INDEX
Explanations
references to significant achievements or events in women's history
New Auto-Interp
Negative Logits
“
-0.22
“[
-0.21
‘
-0.19
’
-0.18
”
-0.17
(“
-0.15
Erf
-0.14
’B
-0.14
’,
-0.14
’T
-0.14
POSITIVE LOGITS
's
0.27
='
0.18
('0.17
'est
0.16
5
0.16
'es
0.15
'id
0.15
,'
0.15
'ı
0.15
0
0.15
Activations Density 0.024%