INDEX
    Explanations

    mentions of the word "ali" with different activation levels

    mentions of 'ali' or variations thereof

    New Auto-Interp
    Negative Logits
    manship
    -0.84
    acters
    -0.84
    ly
    -0.79
    IAL
    -0.77
    ilater
    -0.77
    lier
    -0.76
    olicy
    -0.74
    nect
    -0.74
    ilaterally
    -0.74
    liest
    -0.73
    POSITIVE LOGITS
    yah
    1.17
    ensis
    0.99
    Äĩ
    0.98
    ño
    0.94
    ña
    0.85
    ñ
    0.84
    qi
    0.84
     Lama
    0.81
    WAYS
    0.81
    ère
    0.80
    Act Density 0.029%

    No Known Activations