INDEX
    Explanations

    names and references to specific individuals, particularly in the context of movies or public figures

    New Auto-Interp
    Negative Logits
    wand
    -0.17
    703
    -0.15
    zin
    -0.14
     Achilles
    -0.14
    ull
    -0.14
     surprised
    -0.14
    yang
    -0.14
    11
    -0.14
     pip
    -0.14
    afa
    -0.14
    POSITIVE LOGITS
    argin
    0.17
    emez
    0.16
    prites
    0.16
     sami
    0.15
    abor
    0.15
    ustos
    0.15
    bette
    0.15
     PROCUREMENT
    0.15
    ampa
    0.14
     sam
    0.14
    Act Density 0.051%

    No Known Activations