INDEX
Explanations
conditional phrases and expressions of uncertainty or past choices
New Auto-Interp
Negative Logits
ymes
-0.16
éis
-0.15
gili
-0.14
íĥģ
-0.14
engin
-0.14
nox
-0.14
dont
-0.14
wil
-0.14
ytt
-0.14
jen
-0.14
POSITIVE LOGITS
've
0.65
’ve
0.52
a
0.44
'a
0.44
ve
0.41
’a
0.35
'd
0.34
а
0.27
ta
0.27
da
0.27
Activations Density 0.124%