INDEX
    Explanations

    referring to entities and actions

    New Auto-Interp
    Negative Logits
    h
    0.31
    The
    0.28
     They
    0.28
     
    0.27
    They
    0.27
    den
    0.26
     stabil
    0.26
     It
    0.25
    cd
    0.25
    I
    0.25
    POSITIVE LOGITS
    ל
    0.30
     for
    0.29
     começ
    0.29
    ную
    0.29
    μα
    0.28
    تي
    0.28
    ید
    0.28
    льне
    0.27
    ால்
    0.27
    م
    0.27
    Act Density 0.085%

    No Known Activations