INDEX
Explanations
comparative statements indicating something is quantitatively superior or inferior
phrases indicating a significant distance or disparity
New Auto-Interp
Negative Logits
ortium
-0.62
Polo
-0.59
Turns
-0.58
piracy
-0.58
Dice
-0.57
ulus
-0.56
opoulos
-0.55
Revolutionary
-0.55
Correction
-0.55
ictions
-0.55
POSITIVE LOGITS
med
1.13
fetched
1.07
outwe
1.07
outnumbered
0.98
superior
0.94
fewer
0.91
outweigh
0.91
worse
0.89
thing
0.89
preferable
0.89
Activations Density 0.036%