INDEX
    Explanations

    instances of the word "know" signaling awareness or recognition

    New Auto-Interp
    Negative Logits
    ingen
    -0.17
    _superuser
    -0.16
    wire
    -0.15
    pa
    -0.15
    yah
    -0.14
    .Unity
    -0.14
    owl
    -0.14
     pa
    -0.14
    wine
    -0.14
     Sug
    -0.14
    POSITIVE LOGITS
    irsch
    0.20
    ãĥĥãĥĦ
    0.16
    odon
    0.16
    iland
    0.15
    ooth
    0.15
    ionales
    0.15
    uis
    0.15
    enas
    0.15
    .once
    0.14
    Ñģо
    0.14
    Act Density 0.033%

    No Known Activations