INDEX
Explanations
words or phrases conveying fairness and equality
New Auto-Interp
Negative Logits
hip
-0.15
enk
-0.15
sucker
-0.14
.vo
-0.14
γά
-0.14
Interval
-0.14
seams
-0.14
coll
-0.14
cas
-0.13
ven
-0.13
POSITIVE LOGITS
raki
0.15
jab
0.14
715
0.14
rupa
0.14
rup
0.14
ī´
0.14
ransition
0.14
gaard
0.14
ulton
0.14
Clr
0.14
Activations Density 0.004%