INDEX
Explanations
negative phrases or indications
New Auto-Interp
Negative Logits
NUMX
-0.95
continúas
-0.93
featureID
-0.91
хьтан
-0.88
RectangleBorder
-0.85
متعلقه
-0.79
ChildScrollView
-0.79
^(@)
-0.79
НИК
-0.78
Portale
-0.77
POSITIVE LOGITS
’
0.54
=-
0.53
were
0.51
-
0.51
(-
0.51
.-
0.49
,=
0.49
=
0.48
شدند
0.47
'
0.47
Activations Density 0.641%