INDEX
Explanations
instances of specific verbs related to observation and actions
New Auto-Interp
Negative Logits
pls
-0.15
achen
-0.15
PELL
-0.15
AFE
-0.15
itta
-0.14
itler
-0.14
itte
-0.14
uali
-0.14
anson
-0.14
arring
-0.14
POSITIVE LOGITS
worden
0.37
werden
0.28
wird
0.23
becoming
0.23
wurde
0.23
Bec
0.21
wordt
0.21
become
0.20
becomes
0.20
bec
0.19
Activations Density 0.012%