INDEX
    Explanations

    phrases indicating importance and relevance and the surrounding context

    New Auto-Interp
    Negative Logits
     nahilalakip
    -1.03
    Geplaatst
    -0.99
     EconPapers
    -0.96
    ]")]
    -0.93
    MLLoader
    -0.91
    MessageOf
    -0.89
     ویکی‌پدیا
    -0.85
    EDEFAULT
    -0.85
    ^(@)
    -0.84
     שוליים
    -0.84
    POSITIVE LOGITS
     K
    0.52
    oms
    0.48
     A
    0.48
     Par
    0.47
    /
    0.47
    T
    0.47
     T
    0.46
     head
    0.46
     par
    0.46
    0.45
    Act Density 0.232%

    No Known Activations