INDEX
    Explanations

    phrases indicating a collective or group action

    references to collective actions or concepts

    New Auto-Interp
    Negative Logits
     confirmation
    -0.65
     Lovely
    -0.59
    rick
    -0.59
     disguise
    -0.58
    old
    -0.57
    jar
    -0.55
    ss
    -0.55
     removal
    -0.55
     old
    -0.54
     replacement
    -0.54
    POSITIVE LOGITS
     collectively
    3.78
     individually
    1.62
     collective
    1.55
     jointly
    1.50
     respectively
    1.30
    collect
    1.29
    together
    1.28
     together
    1.24
     unanimously
    1.19
     toget
    1.09
    Act Density 0.016%

    No Known Activations