INDEX
Explanations
words associated with actions or decisions made by characters
New Auto-Interp
Negative Logits
gow
-0.17
esub
-0.15
apter
-0.15
tick
-0.15
anvas
-0.14
пода
-0.13
tır
-0.13
tit
-0.13
elve
-0.13
Naw
-0.13
POSITIVE LOGITS
kön
0.32
dür
0.29
können
0.29
könnte
0.27
lassen
0.26
wollen
0.24
mö
0.23
mü
0.23
möchten
0.23
sollen
0.22
Activations Density 0.013%