INDEX
Explanations
phrases indicating a comparison or contrast
language indicating contrast or opposition in discussions
New Auto-Interp
Negative Logits
redo
-0.71
eda
-0.58
enegger
-0.57
Highlights
-0.55
Dresden
-0.54
mberg
-0.53
Conquer
-0.53
Semin
-0.53
Wag
-0.52
Bam
-0.52
POSITIVE LOGITS
to
0.92
thereto
0.91
itably
0.71
thodox
0.70
acles
0.68
uitive
0.67
to
0.67
TO
0.64
ract
0.62
osite
0.60
Activations Density 0.019%