INDEX
Explanations
references to companionship or group dynamics
New Auto-Interp
Negative Logits
itself
-0.27
Its
-0.18
odzi
-0.18
Its
-0.17
it
-0.16
its
-0.15
for
-0.15
there
-0.15
which
-0.14
while
-0.14
POSITIVE LOGITS
/or
0.25
myself
0.22
ourselves
0.21
.scalablytyped
0.21
erson
0.20
crew
0.20
others
0.20
several
0.19
millions
0.19
cohorts
0.18
Activations Density 0.098%