INDEX
    Explanations

    names or keywords related to specific entities, possibly with a focus on names that are not common nouns

    proper nouns and specific names

    New Auto-Interp
    Negative Logits
     STATS
    -0.68
     innocence
    -0.60
     Cinderella
    -0.60
     meditation
    -0.60
    Reviewer
    -0.59
     Ou
    -0.59
    Constructed
    -0.58
     LSD
    -0.58
     attendant
    -0.58
     preservation
    -0.58
    POSITIVE LOGITS
    edo
    1.08
    oshenko
    1.04
    obl
    0.87
    nel
    0.87
    asury
    0.86
    opot
    0.80
    hiba
    0.80
    arkin
    0.79
    oren
    0.79
    flix
    0.78
    Act Density 0.097%

    No Known Activations