INDEX
Explanations
phrases related to project updates or progress reports
the word "our" indicating collective experience or belonging
New Auto-Interp
Negative Logits
conom
-0.78
puff
-0.75
cum
-0.72
ppings
-0.72
bender
-0.72
ussen
-0.70
icter
-0.69
appears
-0.69
tar
-0.69
more
-0.68
POSITIVE LOGITS
selves
1.20
own
1.05
beloved
0.93
motto
0.87
ourselves
0.87
collective
0.86
respective
0.86
ancestors
0.85
adversaries
0.83
dear
0.83
Activations Density 0.135%