INDEX
    Explanations

    references to figures and visual data in a scientific context

    New Auto-Interp
    Negative Logits
    elan
    -0.19
    мага
    -0.16
    otland
    -0.16
    dG
    -0.16
    ÅĽcie
    -0.15
    ecies
    -0.15
    ialog
    -0.15
    ainless
    -0.15
    izza
    -0.15
    اÙĬر
    -0.15
    POSITIVE LOGITS
    ht
    0.24
     ht
    0.20
     t
    0.18
    width
    0.17
     bh
    0.17
     HT
    0.16
    -HT
    0.15
    bh
    0.15
    bp
    0.15
    t
    0.15
    Act Density 0.007%

    No Known Activations