INDEX
    Explanations

    names of notable people or references to prominent figures

    New Auto-Interp
    Negative Logits
    STANCE
    -0.18
    older
    -0.17
    luck
    -0.15
    OLDER
    -0.15
    dn
    -0.15
    اÙĪØ±
    -0.15
    OLON
    -0.15
    ullo
    -0.14
    olor
    -0.14
    ential
    -0.13
    POSITIVE LOGITS
    querque
    0.27
    azeera
    0.23
     Al
    0.18
    veriÅŁ
    0.16
    gren
    0.16
    andro
    0.15
     Clock
    0.15
     clock
    0.15
    olen
    0.15
    ivia
    0.15
    Act Density 0.062%

    No Known Activations