INDEX
    Explanations

    intercept, forwards, type

    New Auto-Interp
    Negative Logits
    3
    0.63
    4
    0.62
     be
    0.61
     in
    0.61
    6
    0.59
    2
    0.57
     T
    0.51
     tr
    0.50
    5
    0.50
     pl
    0.49
    POSITIVE LOGITS
    0.54
    ląd
    0.52
    Loksatta
    0.51
    FindingsResponse
    0.51
    ব্ধ
    0.49
    RetResult
    0.49
    Daten
    0.48
     informée
    0.48
    Bild
    0.48
    🍵
    0.48
    Act Density 0.003%

    No Known Activations