INDEX
    Explanations

    measurements of distance

    New Auto-Interp
    Negative Logits
    еÑĢб
    -0.15
    ाव
    -0.15
    CRET
    -0.14
    rint
    -0.14
    Boss
    -0.14
    ipeg
    -0.14
     taj
    -0.14
     Yen
    -0.14
    _DISABLE
    -0.13
    кав
    -0.13
    POSITIVE LOGITS
    alla
    0.17
    atts
    0.15
     Meh
    0.14
    ÑĢаÑħ
    0.14
    Norm
    0.14
    iffe
    0.14
    clo
    0.14
    och
    0.14
     Forge
    0.14
    oeff
    0.14
    Act Density 0.002%

    No Known Activations