INDEX
    Explanations
    No Explanations Found
    New Auto-Interp
    Negative Logits
    0.96
    et
    0.77
    ed
    0.70
    il
    0.68
    u
    0.63
    m
    0.63
    و
    0.61
    as
    0.61
    م
    0.59
    on
    0.57
    POSITIVE LOGITS
    0
    0.73
     are
    0.71
     of
    0.66
     were
    0.64
     was
    0.62
     
    0.59
    0.56
     sont
    0.50
    0.49
    0.45
    Act Density 0.000%

    No Known Activations

    This feature has no known activations.