INDEX
Explanations
phrases or sentences comparing different entities or concepts
phrases emphasizing comparative structures or similarities
New Auto-Interp
Negative Logits
umb
-0.75
balcon
-0.68
escal
-0.66
unequ
-0.64
bystand
-0.64
flares
-0.63
extraord
-0.62
emer
-0.62
ama
-0.61
omet
-0.60
POSITIVE LOGITS
sex
0.72
ricanes
0.70
ounter
0.67
dragon
0.67
roman
0.66
riers
0.66
aign
0.64
aneously
0.64
Rico
0.64
same
0.64
Activations Density 0.045%