INDEX
    Explanations
    New Auto-Interp
    Negative Logits
     includ
    -0.07
     therap
    -0.07
     Hwy
    -0.07
     scal
    -0.06
     nutrients
    -0.06
     пло
    -0.06
     plo
    -0.06
    tent
    -0.06
     Give
    -0.06
     inval
    -0.06
    POSITIVE LOGITS
     modern
    0.14
     Modern
    0.13
    Modern
    0.13
    modern
    0.09
    ov
    0.07
    (figsize
    0.07
    21
    0.07
    _CORE
    0.07
     modem
    0.07
    0.07
    Act Density 0.012%

    No Known Activations