INDEX
    Explanations
    New Auto-Interp
    Negative Logits
     plais
    -0.07
    incident
    -0.07
     occurrence
    -0.07
     قائمة
    -0.06
    621
    -0.06
    _posts
    -0.06
    -0.06
     books
    -0.06
     Play
    -0.06
     DST
    -0.06
    POSITIVE LOGITS
    Germany
    0.10
     German
    0.10
     Germany
    0.10
    German
    0.09
    вен
    0.08
    けて
    0.07
     Germans
    0.07
    ndern
    0.07
     Rosa
    0.07
    Manufact
    0.07
    Act Density 0.017%

    No Known Activations