INDEX
Explanations
phrases indicating intentions or desires related to actions
New Auto-Interp
Negative Logits
jk
-0.15
Butler
-0.14
raid
-0.14
if
-0.14
acht
-0.14
vido
-0.13
/kernel
-0.13
fter
-0.13
ife
-0.13
Gonzalez
-0.13
POSITIVE LOGITS
know
0.18
лиÑĤ
0.15
@class
0.15
orz
0.15
ektor
0.14
ondo
0.14
omu
0.14
Sto
0.14
jac
0.14
Ziel
0.14
Activations Density 0.087%