INDEX
Explanations
instances of the verb "be" in various forms
New Auto-Interp
Negative Logits
being
-0.35
Being
-0.34
Being
-0.33
being
-0.33
-being
-0.29
被
-0.27
被
-0.24
already
-0.24
now
-0.24
never
-0.24
POSITIVE LOGITS
friend
0.36
able
0.33
COME
0.32
fall
0.31
get
0.28
fit
0.27
have
0.26
que
0.26
eline
0.24
stride
0.22
Activations Density 0.293%