INDEX
Explanations
phrases indicating a comparison or contrast
comparisons or contrasts between two ideas or states
New Auto-Interp
Negative Logits
enegger
-0.69
Kard
-0.67
ells
-0.66
Niet
-0.65
Carnival
-0.65
Quake
-0.64
boys
-0.61
haven
-0.61
oufl
-0.60
Dirt
-0.59
POSITIVE LOGITS
itably
0.95
entimes
0.79
necessarily
0.75
isons
0.73
onent
0.70
acles
0.70
lihood
0.69
ively
0.68
lectic
0.66
materially
0.65
Activations Density 0.019%