INDEX
    Explanations
    New Auto-Interp
    Negative Logits
    ة
    0.68
    ed
    0.63
    ת
    0.58
    А
    0.57
    For
    0.55
    It
    0.54
    With
    0.51
    а
    0.50
    ה
    0.49
    There
    0.48
    POSITIVE LOGITS
     purposes
    0.72
     dealing
    0.71
     assessing
    0.70
     accessing
    0.69
    erun
    0.66
     obtaining
    0.66
     instance
    0.65
     achieving
    0.65
     determining
    0.63
     inclusivity
    0.62
    Act Density 0.581%

    No Known Activations