INDEX
Explanations
negative statements or expressions in the text
not followed by a state
New Auto-Interp
Negative Logits
שוליים
-0.59
المعيارى
-0.46
orage
-0.44
InitStruct
-0.42
fieldNum
-0.41
✭✭
-0.41
+#+#
-0.40
utafitiHapana
-0.40
juice
-0.40
foria
-0.40
POSITIVE LOGITS
Италијани
0.42
saudável
0.40
publiques
0.39
neither
0.39
przypad
0.38
Szw
0.37
assertFalse
0.37
DockStyle
0.36
démocr
0.36
Neither
0.36
Activations Density 0.139%