INDEX
Explanations
phrases that describe a comparison or equivalence between different entities or concepts
phrases that indicate equivalences or comparisons
New Auto-Interp
Negative Logits
nonetheless
-0.80
bender
-0.78
erer
-0.78
hess
-0.73
nevertheless
-0.70
trave
-0.67
Became
-0.64
hran
-0.64
erers
-0.62
furthermore
-0.62
POSITIVE LOGITS
lihood
0.94
Sov
0.74
anus
0.74
Schr
0.64
othing
0.63
onymous
0.61
blackmail
0.61
subsistence
0.61
otin
0.61
proverbial
0.58
Activations Density 0.086%