INDEX
Explanations
phrases indicating a high level of quality or value
New Auto-Interp
Negative Logits
pras
-0.17
hence
-0.16
grand
-0.14
Hence
-0.14
mani
-0.14
омен
-0.14
really
-0.14
ru
-0.14
alon
-0.14
浩
-0.13
POSITIVE LOGITS
eur
0.17
acket
0.16
avin
0.15
-mini
0.14
CAF
0.14
uario
0.13
ritz
0.13
hetto
0.13
LS
0.13
izont
0.13
Activations Density 0.016%