INDEX
Explanations
keywords related to equality and discrimination, particularly focusing on identity aspects like race, color, and sexual orientation
New Auto-Interp
Negative Logits
<eos>
-0.67
in
-0.64
↵↵
-0.63
hyrchwyd
-0.60
.
-0.59
-0.56
(
-0.56
is
-0.54
])));
-0.52
so
-0.50
POSITIVE LOGITS
kasarigan
1.13
itſelf
1.00
protoimpl
0.99
חיצוניים
0.98
CURIAM
0.97
Мексичка
0.97
שוליים
0.95
Administrativna
0.90
kaynağından
0.88
^(@)
0.87
Activations Density 0.433%