INDEX
Explanations
phrases that describe actions or qualities related to more than one entity or concept
New Auto-Interp
Negative Logits
nonetheless
-0.75
etheless
-0.74
.''.
-0.69
]).
-0.65
ccording
-0.65
+++
-0.62
)))
-0.60
Published
-0.58
Lastly
-0.57
cellaneous
-0.56
POSITIVE LOGITS
but
1.32
but
1.17
But
0.99
BUT
0.96
But
0.95
;}
0.87
BUT
0.78
nor
0.74
However
0.72
})
0.72
Activations Density 0.383%