INDEX
Explanations
references to commonly accepted ideas or standards
New Auto-Interp
Negative Logits
asus
-0.78
kamp
-0.74
rogram
-0.68
romeda
-0.67
Drug
-0.66
haw
-0.66
zik
-0.65
mented
-0.64
acus
-0.63
atoon
-0.63
POSITIVE LOGITS
ised
1.13
ization
1.09
isation
1.01
izations
0.96
wisdom
0.94
fare
0.93
cy
0.91
deviation
0.90
ized
0.89
sized
0.89
Activations Density 3.986%