INDEX
    Explanations

    instances of modal verbs indicating ability or possibility

    New Auto-Interp
    Negative Logits
     use
    -0.38
     tersebut
    -0.36
     itself
    -0.35
    y
    -0.35
     to
    -0.34
    as
    -0.34
     The
    -0.34
    分ほど
    -0.34
      
    -0.33
    /
    -0.32
    POSITIVE LOGITS
    <unused43>
    1.02
    <unused42>
    1.02
    <unused28>
    1.02
    <unused41>
    1.02
    <unused3>
    1.02
    [@BOS@]
    1.02
    <unused8>
    1.02
    <unused16>
    1.02
    <unused47>
    1.02
    <unused52>
    1.02
    Act Density 0.029%

    No Known Activations