INDEX
Explanations
specific symbols or characters, particularly currency symbols and special characters
New Auto-Interp
Negative Logits
ilitating
-0.72
wip
-0.69
livest
-0.68
itaire
-0.67
ktop
-0.66
kered
-0.66
mainline
-0.65
terness
-0.62
ecause
-0.62
ilitation
-0.62
POSITIVE LOGITS
ĺ
0.90
Ĵ
0.89
ģ
0.88
itude
0.86
ãĥ³
0.84
ĭ
0.83
arters
0.78
ments
0.77
į
0.76
Ģ
0.76
Activations Density 0.004%