INDEX
Explanations
the verb "be" and its various forms in different contexts
New Auto-Interp
Negative Logits
UNUSED
-0.17
privileged
-0.16
acier
-0.15
atik
-0.15
utely
-0.14
acob
-0.14
onica
-0.14
átka
-0.14
erton
-0.14
непÑĢи
-0.14
POSITIVE LOGITS
caught
0.26
deter
0.23
sucked
0.23
carried
0.23
sw
0.23
sed
0.22
suck
0.21
Caught
0.21
sid
0.21
lul
0.21
Activations Density 0.088%