INDEX
Explanations
negations and their accompanying phrases
New Auto-Interp
Negative Logits
avond
-0.64
+#+#
-0.63
حياتها
-0.57
iastes
-0.52
Demografia
-0.51
resourceCulture
-0.50
ⓧ
-0.50
DockStyle
-0.49
حياته
-0.47
exitRule
-0.47
POSITIVE LOGITS
mention
1.07
Mention
0.87
mentioning
0.79
mention
0.76
ISupport
0.72
mentions
0.72
Mention
0.70
forget
0.69
forgetting
0.65
виправивши
0.61
Activations Density 0.161%