INDEX
    Explanations
    New Auto-Interp
    Negative Logits
     angels
    -0.07
    Wel
    -0.06
    /ref
    -0.06
    anced
    -0.06
     verg
    -0.06
     وف
    -0.06
     fused
    -0.06
     tvb
    -0.06
     participation
    -0.06
     Cunningham
    -0.06
    POSITIVE LOGITS
    dt
    0.17
     dt
    0.15
    DT
    0.14
    (dt
    0.13
    t
    0.13
     DT
    0.12
    .dt
    0.11
    _dt
    0.10
    T
    0.10
    	dt
    0.09
    Act Density 0.005%

    No Known Activations