INDEX
Explanations
terms and phrases related to social dialogues and discussions about gender norms in various contexts
New Auto-Interp
Negative Logits
)";
-1.20
'},
-1.10
"},
-1.04
`,
-1.03
"],
-1.01
"])
-1.01
`;
-0.99
}")
-0.99
"];
-0.99
',
-0.99
POSITIVE LOGITS
Á
0.53
Nord
0.52
ú
0.51
Nor
0.51
awt
0.51
doubt
0.48
gü
0.48
Eagles
0.48
лу
0.48
BeginContext
0.47
Activations Density 2.463%