INDEX
Explanations
negations or expressions of contradiction
New Auto-Interp
Negative Logits
DockStyle
-0.93
purpoſe
-0.91
AddTagHelper
-0.90
Jefus
-0.88
myſelf
-0.88
يتيمه
-0.87
ModelExpression
-0.84
disambiguazione
-0.84
pleaſure
-0.83
cauſe
-0.82
POSITIVE LOGITS
not
0.98
going
0.96
is
0.95
a
0.95
also
0.86
an
0.80
really
0.77
being
0.76
likely
0.75
WAS
0.75
Activations Density 0.102%