INDEX
Explanations
words that express strong emotional states or conflicts
New Auto-Interp
Negative Logits
父子
-0.79
的男人
-0.78
muž
-0.75
pria
-0.71
👬
-0.70
męski
-0.70
Mr
-0.69
masculina
-0.69
mannen
-0.69
brotherhood
-0.68
POSITIVE LOGITS
woman
1.78
lady
1.69
women
1.63
female
1.58
woman
1.56
Woman
1.53
girl
1.45
women
1.45
Woman
1.43
Women
1.42
Activations Density 1.162%