INDEX
Explanations
negation or denial statements
New Auto-Interp
Negative Logits
.appspot
-0.16
.prot
-0.15
gree
-0.15
_keeper
-0.15
YC
-0.15
iqueta
-0.14
lemn
-0.14
irie
-0.14
lero
-0.14
ntl
-0.14
POSITIVE LOGITS
ono
0.15
ore
0.15
zz
0.15
sp
0.14
gender
0.14
ody
0.14
ove
0.14
oks
0.14
ella
0.14
further
0.13
Activations Density 0.065%