INDEX
Explanations
specific programming or mathematical syntax elements
New Auto-Interp
Negative Logits
nosaurus
-0.61
}';
-0.60
dymyr
-0.60
quirer
-0.59
[toxicity=0]
-0.58
fecture
-0.55
-
-0.55
orrhea
-0.53
AutoField
-0.53
quium
-0.53
POSITIVE LOGITS
myſelf
1.30
himſelf
1.26
itſelf
1.23
ſelves
1.15
Anſ
1.15
Jefus
1.14
themſelves
1.12
ſelf
1.12
Conſ
1.12
Reſ
1.10
Activations Density 0.930%