INDEX
Explanations
phrases indicating capability or potential action
New Auto-Interp
Negative Logits
ILINE
-0.06
alam
-0.06
patron
-0.05
chein
-0.05
Liberation
-0.05
know
-0.05
setId
-0.05
==============================================================
-0.05
directly
-0.05
Caul
-0.05
POSITIVE LOGITS
215
0.08
adel
0.07
handle
0.07
opi
0.07
ade
0.07
handle
0.07
ocale
0.07
udrž
0.07
æĪIJåĬŁ
0.07
HING
0.07
Activations Density 0.033%