INDEX
Explanations
negations or phrases that express doubt or denial
New Auto-Interp
Negative Logits
resourceCulture
-0.66
nonUne
-0.54
menudo
-0.52
spesso
-0.51
kasarigan
-0.51
constamment
-0.51
sering
-0.48
sometimes
-0.48
sometimes
-0.46
často
-0.46
POSITIVE LOGITS
deterred
0.81
tagHelperRunner
0.72
DockStyle
0.69
SequentialGroup
0.68
="@+
0.66
الرياضيه
0.61
hamdu
0.61
entiment
0.60
flin
0.60
perturbed
0.59
Activations Density 0.246%