INDEX
Explanations
affirmative phrases or statements expressing satisfaction
New Auto-Interp
Negative Logits
Tato
-0.17
uki
-0.17
lector
-0.16
obe
-0.14
arov
-0.14
itself
-0.14
erable
-0.14
оÑĩки
-0.14
olt
-0.14
ours
-0.14
POSITIVE LOGITS
sure
0.15
yll
0.15
edy
0.15
Proud
0.15
currently
0.15
apr
0.14
edImage
0.14
/Dk
0.14
usz
0.14
edo
0.14
Activations Density 0.073%