INDEX
    Explanations

    words related to specific locations or tribes, likely the Tuareg tribe given the activations

    the word "are" in different contexts

    New Auto-Interp
    Negative Logits
    ingen
    -0.80
    omez
    -0.76
    ured
    -0.76
    ulates
    -0.74
    uration
    -0.73
    ues
    -0.72
    enegger
    -0.72
    inatory
    -0.71
    isting
    -0.69
    inosaur
    -0.68
    POSITIVE LOGITS
    tto
    1.12
    tta
    1.08
    nce
    1.01
    lli
    1.01
    llan
    0.97
    nda
    0.93
    tsky
    0.92
    ndra
    0.92
    nces
    0.90
    zza
    0.90
    Act Density 0.018%

    No Known Activations