INDEX
    Explanations
    New Auto-Interp
    Negative Logits
    ת
    1.07
    ס
    0.94
     are
    0.90
    ت
    0.84
    it
    0.84
     neoprene
    0.81
    ה
    0.81
    ות
    0.79
    ب
    0.77
     sono
    0.76
    POSITIVE LOGITS
    batross
    0.79
    asc
    0.71
    inition
    0.68
    text
    0.67
    ash
    0.64
    forcement
    0.64
    ving
    0.64
    ani
    0.63
    posed
    0.63
    conv
    0.62
    Act Density 0.001%

    No Known Activations