INDEX
    Explanations

    occurrences of the word "self" in various contexts

    New Auto-Interp
    Negative Logits
    sse
    -0.17
    ture
    -0.17
    éné
    -0.16
    oplevel
    -0.16
    åĻ
    -0.14
     Nguyên
    -0.14
    lém
    -0.14
    Topology
    -0.14
    agua
    -0.14
    ristol
    -0.14
    POSITIVE LOGITS
    hoff
    0.14
    590
    0.14
     kil
    0.14
     Aires
    0.14
    gone
    0.14
    "display
    0.14
    911
    0.14
    h
    0.13
    zelf
    0.13
    ă
    0.13
    Act Density 0.010%

    No Known Activations