INDEX
Explanations
adverbs that indicate certainty or frequency
New Auto-Interp
Negative Logits
is
-0.69
was
-0.57
are
-0.57
were
-0.48
themſelves
-0.39
will
-0.35
هو
-0.35
は
-0.34
هي
-0.33
'
-0.32
POSITIVE LOGITS
AddTagHelper
0.61
belonged
0.59
seemed
0.55
DeleteBehavior
0.55
been
0.55
emailAlready
0.55
الحياه
0.54
existed
0.53
__*/
0.52
Infórmanos
0.52
Activations Density 0.372%