INDEX
    Explanations

    phrases with the pattern "âĢĻ" followed by a number as a token

    instances of a particular character or symbol

    New Auto-Interp
    Negative Logits
     imitation
    -0.71
     carbohyd
    -0.67
    arios
    -0.65
    raviolet
    -0.64
    ramid
    -0.63
     pyramid
    -0.63
    iage
    -0.62
    wana
    -0.62
     convenience
    -0.62
     XT
    -0.61
    POSITIVE LOGITS
    女
    1.03
    Ļ
    0.96
    ï¸ı
    0.92
    İ
    0.87
    Ùħ
    0.86
    Ľ
    0.85
    ı
    0.84
    Ķ
    0.82
    ļ
    0.82
    ð
    0.81
    Act Density 0.488%

    No Known Activations