INDEX
    Explanations

    instances of the word "to" in various forms

    New Auto-Interp
    Negative Logits
    ÏĦηγοÏģ
    -0.16
    visor
    -0.16
    raya
    -0.15
    ulur
    -0.15
    asons
    -0.14
    irim
    -0.14
     frau
    -0.14
    lea
    -0.14
    rens
    -0.14
    Ðĭ
    -0.14
    POSITIVE LOGITS
    zap
    0.16
    ANS
    0.15
    kr
    0.14
    sid
    0.14
    οÏį
    0.14
    oc
    0.14
     Craw
    0.13
    ICE
    0.13
    ά
    0.13
    ns
    0.13
    Act Density 0.049%

    No Known Activations