INDEX
Explanations
phrases related to social issues, particularly gender-related conflicts and political discussions
New Auto-Interp
Negative Logits
.).
-0.79
]).
-0.74
]."
-0.72
)).
-0.69
}.
-0.64
.'"
-0.62
).[
-0.62
].
-0.60
)."
-0.60
!).
-0.58
POSITIVE LOGITS
ãĥĺãĥ©
0.56
izont
0.50
ãĥİ
0.48
emale
0.48
akeru
0.46
reens
0.45
esides
0.45
NAME
0.43
renheit
0.43
aeda
0.42
Activations Density 2.279%