INDEX
    Explanations
    New Auto-Interp
    Negative Logits
    ap
    0.64
     крити
    0.49
    h
    0.48
    m
    0.47
    ior
    0.44
     മികച്ച
    0.44
    oe
    0.43
     supérieurs
    0.43
    et
    0.42
     eer
    0.42
    POSITIVE LOGITS
     🥰
    0.76
     💕
    0.66
    🥰
    0.62
     nostalgia
    0.56
     💞
    0.55
    0.54
     😍
    0.52
     ❤️
    0.51
    0.51
    hearts
    0.50
    Act Density 0.189%

    No Known Activations