INDEX
    Explanations

    words/stories/language

    New Auto-Interp
    Negative Logits
    .Sup
    -0.07
    :A
    -0.07
     DOWN
    -0.07
    (True
    -0.06
     Hero
    -0.06
     diseñador
    -0.06
    .it
    -0.06
    Gate
    -0.06
    .Template
    -0.06
    bil
    -0.06
    POSITIVE LOGITS
    cond
    0.07
    .roles
    0.06
    =document
    0.06
     internally
    0.06
    proj
    0.06
     parted
    0.06
    hound
    0.06
    cstring
    0.06
    vision
    0.06
     smelled
    0.06
    Act Density 0.036%

    No Known Activations