INDEX
    Explanations
    No Explanations Found
    New Auto-Interp
    Negative Logits
     Jihad
    -0.76
     Bans
    -0.67
    âĸ¬âĸ¬
    -0.63
    Rah
    -0.62
     awa
    -0.60
     invested
    -0.60
     Bris
    -0.60
     Haley
    -0.60
    Found
    -0.59
    %:
    -0.58
    POSITIVE LOGITS
    ername
    0.77
    onut
    0.74
    agnetic
    0.72
    glers
    0.71
    renheit
    0.70
    drm
    0.69
    erk
    0.69
    rists
    0.69
    aido
    0.68
    lies
    0.67
    Act Density 0.000%

    No Known Activations

    This feature has no known activations.