INDEX
    Explanations
    No Explanations Found
    New Auto-Interp
    Negative Logits
    iants
    -0.84
    onge
    -0.77
    wolves
    -0.72
    sburgh
    -0.71
    othal
    -0.71
    ieth
    -0.71
    umblr
    -0.68
    sburg
    -0.67
    Detroit
    -0.66
    tale
    -0.66
    POSITIVE LOGITS
     Ramadan
    0.77
     hijab
    0.75
    atel
    0.72
     Braz
    0.68
     Afgh
    0.63
     Persian
    0.60
    ruction
    0.60
     Airl
    0.59
    اÙĦ
    0.59
     revelation
    0.59
    Act Density 0.000%

    No Known Activations

    This feature has no known activations.