INDEX
Explanations
phrases indicating uncertainty or conditions related to actions and outcomes
New Auto-Interp
Negative Logits
Ỽ
-0.15
iqu
-0.13
olk
-0.13
nors
-0.13
istr
-0.13
we
-0.13
czas
-0.13
ar
-0.13
anca
-0.13
ateurs
-0.13
POSITIVE LOGITS
having
0.35
having
0.26
Having
0.24
Having
0.24
knowing
0.20
being
0.18
allowing
0.18
ayant
0.18
Presence
0.17
doing
0.17
Activations Density 0.530%