INDEX
Explanations
instances of the word "we" in various contexts, indicating a focus on collective identity or shared experiences
New Auto-Interp
Negative Logits
ctor
-0.18
cf
-0.15
cala
-0.15
beforeSend
-0.15
rog
-0.14
ree
-0.14
aye
-0.14
sh
-0.14
hawk
-0.14
g
-0.13
POSITIVE LOGITS
eping
0.21
athers
0.18
bservice
0.17
’re
0.17
’ll
0.16
eding
0.16
’ve
0.16
've
0.15
eds
0.15
ights
0.15
Activations Density 0.291%