INDEX
    Explanations

    the presence of the word "here" in various contexts

    New Auto-Interp
    Negative Logits
     Ethiopian
    -0.52
     Apoll
    -0.50
     Tibetan
    -0.50
     опро
    -0.48
     Gogh
    -0.48
     Fanning
    -0.47
    pAd
    -0.47
     Polonia
    -0.47
     PD
    -0.47
     Draft
    -0.46
    POSITIVE LOGITS
    here
    1.16
    Here
    1.10
     here
    1.09
     Here
    1.08
    HERE
    1.06
     HERE
    1.00
     aquí
    0.93
     Aquí
    0.91
     tää
    0.90
     здесь
    0.84
    Act Density 0.088%

    No Known Activations