INDEX
    Explanations

    verbs followed by common words

    New Auto-Interp
    Negative Logits
     स्टूडेंट
    0.53
    0.47
    0.47
    0.46
     étroites
    0.45
     ঠিক
    0.45
     dejó
    0.45
    striatis
    0.44
     보니
    0.44
     tecido
    0.44
    POSITIVE LOGITS
    lun
    0.43
    以便
    0.42
    كن
    0.42
    RNN
    0.41
     Zn
    0.41
     quản
    0.40
    \(
    0.40
    arz
    0.39
    andered
    0.39
     moderator
    0.38
    Act Density 0.000%

    No Known Activations