INDEX
    Explanations

    questions or statements about general knowledge or information

    phrases related to conveying information or experiences

    New Auto-Interp
    Negative Logits
    swer
    -0.73
    estone
    -0.70
    idon
    -0.67
    raid
    -0.67
    robe
    -0.66
    odge
    -0.66
    xon
    -0.64
    scan
    -0.64
    ivery
    -0.64
    etts
    -0.63
    POSITIVE LOGITS
     happens
    1.48
     constitutes
    1.42
     happened
    1.25
     transpired
    1.17
     separates
    1.17
     distinguishes
    1.14
     motiv
    1.10
     kinds
    1.09
     qualifies
    1.07
     makes
    1.05
    Act Density 0.095%

    No Known Activations