INDEX
Explanations
instances where actions are being directed towards a specific recipient, often with a call for support or sharing
conjunctions and phrases that suggest collective action or shared experiences
New Auto-Interp
Negative Logits
Ahead
-0.70
smanship
-0.65
iren
-0.65
ourse
-0.59
flagship
-0.59
raph
-0.58
duc
-0.58
ahead
-0.57
rup
-0.57
estate
-0.56
POSITIVE LOGITS
selves
0.79
Templ
0.72
vous
0.71
selves
0.69
azy
0.68
imaru
0.66
mates
0.63
otent
0.62
ortmund
0.62
THEN
0.61
Activations Density 0.527%