INDEX
Explanations
phrases indicating a comparison or similarity
New Auto-Interp
Negative Logits
↵↵
-0.15
.matches
-0.15
Tucker
-0.15
oras
-0.14
uja
-0.14
urus
-0.14
ingen
-0.14
usty
-0.14
ancellor
-0.14
tic
-0.14
POSITIVE LOGITS
401
0.15
antry
0.15
abin
0.15
425
0.14
ihan
0.14
ondo
0.14
onor
0.14
ãĥ©ãĤ¤ãĥ³
0.14
ewe
0.14
qx
0.13
Activations Density 0.057%