INDEX
    Explanations
    New Auto-Interp
    Negative Logits
     Outline
    -0.07
    724
    -0.06
     Pixar
    -0.06
     advocacy
    -0.06
    Sampler
    -0.06
     prá
    -0.06
     Swift
    -0.06
     erfolgre
    -0.06
    024
    -0.06
     assaults
    -0.06
    POSITIVE LOGITS
     religion
    0.18
     Religion
    0.16
     religious
    0.13
     الدين
    0.09
     Religious
    0.08
     religions
    0.08
     relig
    0.08
    igion
    0.08
    Radi
    0.08
    وان
    0.07
    Act Density 0.009%

    No Known Activations