INDEX
    Explanations

    words associated with confusion or instability

    New Auto-Interp
    Negative Logits
     refined
    -0.65
    ngth
    -0.64
    catentry
    -0.62
     ner
    -0.58
     diction
    -0.55
    Dial
    -0.54
     gifted
    -0.54
     libel
    -0.53
    chell
    -0.53
     towed
    -0.53
    POSITIVE LOGITS
    ither
    0.89
    rift
    0.85
    asa
    0.78
    oros
    0.76
    ewater
    0.76
    acia
    0.72
    abus
    0.72
    alys
    0.71
    oon
    0.71
    ike
    0.69
    Act Density 0.031%

    No Known Activations