INDEX
    Explanations

    references to companionship or group dynamics

    New Auto-Interp
    Negative Logits
     itself
    -0.27
     Its
    -0.18
    odzi
    -0.18
    Its
    -0.17
     it
    -0.16
     its
    -0.15
     for
    -0.15
     there
    -0.15
     which
    -0.14
     while
    -0.14
    POSITIVE LOGITS
    /or
    0.25
     myself
    0.22
     ourselves
    0.21
    .scalablytyped
    0.21
    erson
    0.20
     crew
    0.20
     others
    0.20
     several
    0.19
     millions
    0.19
     cohorts
    0.18
    Act Density 0.098%

    No Known Activations