INDEX
Explanations
phrases that imply comparison or similarity
New Auto-Interp
Negative Logits
artment
-0.16
ITER
-0.15
("-0.15
owo
-0.15
exion
-0.14
Poh
-0.14
leted
-0.14
ington
-0.14
dri
-0.14
shal
-0.14
POSITIVE LOGITS
though
0.41
Though
0.30
Though
0.30
though
0.27
бÑĥдÑĤо
0.22
tho
0.21
inine
0.17
aunque
0.17
if
0.16
Tho
0.15
Activations Density 0.014%