INDEX
Explanations
words related to various forms of bias
New Auto-Interp
Negative Logits
Ke
-0.15
sine
-0.15
vo
-0.14
SI
-0.14
OI
-0.14
èĩªæ²»
-0.13
_SI
-0.13
Bond
-0.13
poll
-0.13
ów
-0.13
POSITIVE LOGITS
eczy
0.18
ogg
0.18
keit
0.16
kees
0.15
istrovstvÃŃ
0.15
ayd
0.15
nya
0.15
readcr
0.14
ContextHolder
0.14
μι
0.14
Activations Density 0.006%