INDEX
Explanations
phrases related to comparison or contrast
instances of the word "and" or similar conjunctions in lists
New Auto-Interp
Negative Logits
cies
-0.68
itionally
-0.68
itions
-0.67
itives
-0.66
orer
-0.65
que
-0.65
hens
-0.64
coat
-0.63
enta
-0.62
eed
-0.62
POSITIVE LOGITS
rongh
0.65
somew
0.62
recomm
0.59
diminishing
0.59
Aph
0.59
Ares
0.58
uana
0.58
disqual
0.58
grav
0.57
accompanied
0.57
Activations Density 0.206%