INDEX
    Explanations

    phrases emphasizing collective responsibility and shared experiences

    New Auto-Interp
    Negative Logits
     never
    -0.21
     not
    -0.19
     không
    -0.19
     neither
    -0.18
     tidak
    -0.18
     cannot
    -0.18
     nicht
    -0.18
    ä¸įä¼ļ
    -0.17
     doesn
    -0.17
    æīĢæľī
    -0.17
    POSITIVE LOGITS
    uded
    0.29
    ude
    0.25
    uding
    0.23
     alike
    0.22
    ready
    0.21
    ayed
    0.21
    udes
    0.20
    -important
    0.19
    ways
    0.19
    LLLL
    0.19
    Act Density 0.079%

    No Known Activations