INDEX
Explanations
expressions of opinion or conclusion statements in discussions or analyses
New Auto-Interp
Negative Logits
featureID
-0.55
régal
-0.48
нгред
-0.47
SEDS
-0.45
bootstrapcdn
-0.45
tagext
-0.44
LabelTagHelper
-0.44
敺
-0.44
tuyo
-0.44
Regents
-0.43
POSITIVE LOGITS
justifying
0.41
nakalista
0.40
argument
0.39
justified
0.38
שוליים
0.38
gezegd
0.37
arXiv
0.37
arguments
0.37
convincing
0.37
XmlAccessorType
0.36
Activations Density 1.313%