INDEX
    Explanations
    New Auto-Interp
    Negative Logits
    i
    1.48
    t
    0.89
    alas
    0.87
    the
    0.86
    rations
    0.85
    tooth
    0.84
    faced
    0.82
    spanning
    0.82
    ut
    0.80
    floral
    0.80
    POSITIVE LOGITS
    '
    1.15
    1.02
    ك
    0.99
     physicist
    0.96
    ק
    0.95
    أ
    0.91
     physics
    0.90
    ה
    0.81
    0.81
     PHYSICS
    0.80
    Act Density 0.026%

    No Known Activations