INDEX
Explanations
instances of the word "wrong" and related concepts of wrongdoing
New Auto-Interp
Negative Logits
ſhe
-0.63
houſe
-0.57
awsze
-0.56
ſtate
-0.56
disfraz
-0.53
niyang
-0.52
Armour
-0.51
puntata
-0.51
perſon
-0.51
bicara
-0.50
POSITIVE LOGITS
pregnant
1.14
wrong
1.07
Pregnant
1.04
pregnant
0.98
pregnancy
0.98
Pregnancy
0.91
MigrationBuilder
0.88
reproduction
0.87
Reproduction
0.85
wrong
0.85
Activations Density 0.075%