INDEX
Explanations
expressions of desire and expectations regarding quality and outcomes
New Auto-Interp
Negative Logits
Pert
-0.17
asca
-0.15
dana
-0.14
AJOR
-0.14
rum
-0.14
Eg
-0.14
Dawn
-0.14
á»Ń
-0.14
acad
-0.14
Helm
-0.14
POSITIVE LOGITS
without
0.25
without
0.24
ohne
0.20
Without
0.20
Without
0.19
_without
0.18
ildo
0.17
senza
0.16
zonder
0.16
fast
0.16
Activations Density 0.010%