INDEX
Explanations
phrases expressing similarity or comparison
New Auto-Interp
Negative Logits
urus
-0.17
PLEX
-0.16
istrovstvÃŃ
-0.14
mic
-0.14
Schro
-0.14
fro
-0.14
Merchant
-0.13
iscard
-0.13
scribe
-0.13
merchant
-0.13
POSITIVE LOGITS
llx
0.15
.variables
0.15
Outlined
0.15
heraus
0.14
antry
0.14
Giul
0.14
Kens
0.14
.nlm
0.13
rollo
0.13
idf
0.13
Activations Density 0.014%