INDEX
    Explanations

    phrases emphasizing mutual support and connection among individuals

    New Auto-Interp
    Negative Logits
    uci
    -0.78
    0002
    -0.70
    lam
    -0.64
    hift
    -0.63
     reimb
    -0.63
    ariat
    -0.61
    °
    -0.61
    alty
    -0.60
    nit
    -0.60
     DK
    -0.60
    POSITIVE LOGITS
    selves
    0.92
    worldly
    0.83
     individually
    0.80
    self
    0.74
     equally
    0.71
    heric
    0.70
     offensively
    0.68
    é¾įåĸļ士
    0.68
    anguages
    0.68
     mutually
    0.67
    Act Density 0.009%

    No Known Activations