INDEX
    Explanations

    mandatory instructions or updates

    New Auto-Interp
    Negative Logits
     rifi
    0.48
     दोस्त
    0.47
     သူ့
    0.47
    好友
    0.46
     žmog
    0.45
     prijatel
    0.45
     얘가
    0.44
    élène
    0.44
     friends
    0.44
     adventures
    0.44
    POSITIVE LOGITS
    全員
    0.89
    集体
    0.75
     collectively
    0.75
    Everyone
    0.70
     everyone
    0.69
    everyone
    0.69
     everybody
    0.68
     Everybody
    0.68
     Everyone
    0.67
    统一
    0.66
    Act Density 0.043%

    No Known Activations