INDEX
Explanations
key terms related to incidents and evaluations of fairness in situations
New Auto-Interp
Negative Logits
ãĥ¼ãĥ³
-0.16
Heb
-0.15
urette
-0.15
ÑĢод
-0.14
thous
-0.14
surname
-0.14
mlin
-0.14
036
-0.14
.Modules
-0.13
sik
-0.13
POSITIVE LOGITS
eco
0.16
iners
0.15
èĸ
0.14
.TabStop
0.14
doc
0.14
ẩm
0.14
cher
0.14
è±
0.13
Blond
0.13
acles
0.13
Activations Density 0.012%