INDEX
Explanations
words related to the evaluation of social norms and expectations regarding women's roles
New Auto-Interp
Negative Logits
etc
-0.39
подоб
-0.38
forChild
-0.38
Ordin
-0.38
eikä
-0.38
Etc
-0.37
Etc
-0.36
既
-0.36
sice
-0.36
offline
-0.36
POSITIVE LOGITS
sebaliknya
0.97
наоборот
0.85
juist
0.85
conversely
0.82
justru
0.78
downright
0.75
inkább
0.75
malah
0.74
vielmehr
0.74
それとも
0.73
Activations Density 0.881%