INDEX
    Explanations

    proper nouns, particularly names like "Andre" with varying levels of activation

    occurrences of the name "Andre."

    New Auto-Interp
    Negative Logits
    inct
    -0.96
    ulhu
    -0.80
    manship
    -0.74
    lishing
    -0.73
    stakes
    -0.71
    ointed
    -0.68
    plain
    -0.67
    yrinth
    -0.65
    lied
    -0.65
    lished
    -0.65
    POSITIVE LOGITS
    tti
    1.14
    essen
    0.91
    byss
    0.82
    cats
    0.78
     Andre
    0.78
     Gord
    0.72
     Paste
    0.71
     XIII
    0.69
    aic
    0.68
    idis
    0.67
    Act Density 0.023%

    No Known Activations