INDEX
    Explanations

    phrases indicating actions, achievements, or obligations

    New Auto-Interp
    Negative Logits
     Usaha
    -0.42
     themſelves
    -0.40
    elcome
    -0.40
     itſelf
    -0.38
     both
    -0.38
     alſo
    -0.38
     keduanya
    -0.37
    -0.36
     căng
    -0.36
     gleiche
    -0.36
    POSITIVE LOGITS
    Only
    0.94
     Only
    0.93
     only
    0.91
    only
    0.88
    ONLY
    0.84
     ONLY
    0.83
     лишь
    0.81
     רק
    0.71
     только
    0.70
    Hanya
    0.66
    Act Density 0.040%

    No Known Activations