INDEX
    Explanations

    modal verbs

    New Auto-Interp
    Negative Logits
     CLO
    -0.08
     RSV
    -0.08
     prad
    -0.08
     summed
    -0.08
    ریب
    -0.07
     Hassan
    -0.07
     rodi
    -0.07
     tamb
    -0.07
     nao
    -0.07
     δω
    -0.07
    POSITIVE LOGITS
    _partner
    0.09
     partners
    0.09
     partner
    0.09
    ight
    0.08
    partner
    0.08
     ergän
    0.08
    makes
    0.08
    F
    0.08
    0.07
    Partner
    0.07
    Act Density 0.080%

    No Known Activations