INDEX
Explanations
phrases expressing value or importance
evaluative statements about worth or significance
New Auto-Interp
Negative Logits
but
-0.87
But
-0.72
But
-0.67
eatured
-0.66
hari
-0.62
ructose
-0.62
BUT
-0.61
schild
-0.60
ornia
-0.60
but
-0.59
POSITIVE LOGITS
nonetheless
1.85
anyway
1.14
nevertheless
1.10
anyways
1.05
etheless
1.05
owing
0.87
.
0.82
insofar
0.82
because
0.81
.[
0.77
Activations Density 0.951%