INDEX
Explanations
phrases indicating comparison or contrast
phrases expressing different perspectives or interpretations
New Auto-Interp
Negative Logits
UCH
-0.72
andel
-0.68
grade
-0.65
gur
-0.63
iere
-0.63
ilage
-0.63
aquin
-0.61
aeus
-0.61
erm
-0.60
prus
-0.60
POSITIVE LOGITS
resembles
0.87
resemble
0.69
resembled
0.69
stranger
0.69
resemb
0.68
parallels
0.65
hift
0.64
Lange
0.64
perverse
0.62
analogous
0.62
Activations Density 0.042%