INDEX
Explanations
untested or unverified information or claims
New Auto-Interp
Negative Logits
anwhile
-0.88
å§«
-0.76
hyde
-0.73
phrine
-0.72
SHIP
-0.71
Pigs
-0.67
briefs
-0.67
cium
-0.66
ŃĶ
-0.66
*/(
-0.65
POSITIVE LOGITS
ruly
1.11
itled
1.05
ested
0.97
rave
0.96
ribut
0.94
enable
0.94
ired
0.92
ainted
0.92
race
0.89
oward
0.89
Activations Density 6.698%