INDEX
Explanations
phrases that indicate decision-making and actions taken
New Auto-Interp
Negative Logits
059
-0.16
Ã¥l
-0.15
quo
-0.15
issance
-0.14
ãģ¯ãģļ
-0.14
.variable
-0.13
adge
-0.13
133
-0.13
qid
-0.13
ooter
-0.13
POSITIVE LOGITS
leine
0.22
sure
0.19
ñana
0.17
angelo
0.17
zell
0.16
estre
0.15
onna
0.15
strides
0.15
genu
0.14
lei
0.14
Activations Density 0.142%