INDEX
Explanations
phrases that express exactness or specificity
New Auto-Interp
Negative Logits
ye
-0.18
unts
-0.16
rella
-0.15
quist
-0.15
entic
-0.15
ess
-0.15
exion
-0.15
oran
-0.14
specifically
-0.14
ping
-0.14
POSITIVE LOGITS
itude
0.26
opposite
0.20
itudes
0.17
ITUDE
0.17
-ÑĤаки
0.16
aleb
0.16
elly
0.16
à¹Ħหà¸Ļ
0.16
;y
0.16
ingly
0.15
Activations Density 0.025%