INDEX
    Explanations

    the word "为" (translated as "for" or "to be" in English)

    New Auto-Interp
    Negative Logits
     ")");
    -0.60
    %")
    -0.59
     Anton
    -0.54
    %";
    -0.53
    %");
    -0.53
    NSIndexPath
    -0.52
    /')
    -0.52
     Akk
    -0.51
     recons
    -0.50
    kni
    -0.50
    POSITIVE LOGITS
    2.50
    2.45
     为
    2.30
     為
    2.19
    1.95
    为自己
    1.45
    为他
    1.33
    为你
    1.15
    为人
    1.14
    图为
    1.11
    Act Density 0.098%

    No Known Activations