INDEX
    Explanations

    harmful impact on others

    New Auto-Interp
    Negative Logits
     களை
    0.41
    相手
    0.40
    qdm
    0.39
     partners
    0.39
    icmp
    0.39
    well
    0.38
    Preferred
    0.38
    objects
    0.38
    partner
    0.38
    preferred
    0.37
    POSITIVE LOGITS
     others
    0.86
     Others
    0.70
    Others
    0.68
    其他人
    0.67
    others
    0.64
     advising
    0.59
     someone
    0.56
     दूसरों
    0.55
    他人
    0.54
     alguém
    0.54
    Act Density 0.084%

    No Known Activations