INDEX
    Explanations
    New Auto-Interp
    Negative Logits
     Frog
    -0.08
     dataset
    -0.07
     Wolves
    -0.07
     Safari
    -0.07
     sopr
    -0.06
    coil
    -0.06
     leaks
    -0.06
     Martinez
    -0.06
    .button
    -0.06
     damages
    -0.06
    POSITIVE LOGITS
    الق
    0.07
    .we
    0.06
    ічні
    0.06
    0.06
    Recommend
    0.06
    注册
    0.06
    0.06
    abbrev
    0.06
    ọt
    0.06
    .Toast
    0.06
    Act Density 0.012%

    No Known Activations