INDEX
Explanations
occurrences of the word "we" indicating a collective perspective or action
New Auto-Interp
Negative Logits
itself
-0.22
was
-0.19
ly
-0.17
(s
-0.15
aug
-0.15
st
-0.14
ctor
-0.14
ocate
-0.14
g
-0.13
dez
-0.13
POSITIVE LOGITS
ourselves
0.41
’re
0.39
're
0.36
've
0.34
’ve
0.32
are
0.31
eping
0.28
Ñħодим
0.28
'll
0.28
’ll
0.27
Activations Density 0.297%