INDEX
Explanations
proper names, specifically those related to historical figures and locations
New Auto-Interp
Negative Logits
skirts
-0.66
âĶģ
-0.60
åŃ
-0.58
DIV
-0.56
icable
-0.55
ODUCT
-0.53
é¾įå¥ij士
-0.52
Forbidden
-0.52
Concern
-0.51
earance
-0.50
POSITIVE LOGITS
hart
0.86
ardo
0.82
Trotsky
0.79
hardt
0.78
idas
0.76
ette
0.72
hard
0.71
orian
0.69
utenant
0.69
idates
0.69
Activations Density 6.304%