INDEX
Explanations
comparative phrases highlighting differences or similarities
instances of comparisons between different subjects or entities
New Auto-Interp
Negative Logits
authorized
-0.72
Polo
-0.64
dar
-0.61
aida
-0.58
STAR
-0.58
gren
-0.56
Semin
-0.55
affe
-0.54
primary
-0.54
acket
-0.54
POSITIVE LOGITS
to
1.05
thereto
1.01
favorably
0.97
favour
0.82
unto
0.79
with
0.76
to
0.72
icut
0.71
lihood
0.67
compared
0.67
Activations Density 0.028%