INDEX
    Explanations
    New Auto-Interp
    Negative Logits
     restrictive
    -0.08
    Restr
    -0.08
     restring
    -0.08
    anno
    -0.08
    -0.08
     appointed
    -0.08
     Restricted
    -0.08
     clinical
    -0.08
     promoted
    -0.07
     Vertical
    -0.07
    POSITIVE LOGITS
     interpolate
    0.08
     interpolation
    0.08
     Wass
    0.08
    .spotify
    0.08
    _interp
    0.08
    interp
    0.08
     mole
    0.08
    .decrypt
    0.08
     drunk
    0.07
    .loader
    0.07
    Act Density 0.001%

    No Known Activations