INDEX
    Explanations
    New Auto-Interp
    Negative Logits
    .
    1.47
     as
    1.44
     .
    1.17
    h
    1.16
     are
    1.13
     you
    1.12
    ig
    1.02
    us
    1.02
    1.00
    u
    0.98
    POSITIVE LOGITS
    د
    1.59
    т
    1.58
     widgets
    1.28
    м
    1.25
    ي
    1.24
     widget
    1.23
    интере
    1.21
    ل
    1.20
     analiza
    1.18
    ان
    1.16
    Act Density 0.005%

    No Known Activations