INDEX
    Explanations

    statements revealing a surprising or unexpected revelation

    New Auto-Interp
    Negative Logits
    cious
    -0.81
    oided
    -0.80
    avorite
    -0.77
    resents
    -0.77
    erve
    -0.76
    erved
    -0.75
    shaw
    -0.71
    ettlement
    -0.70
    cius
    -0.69
    heed
    -0.69
    POSITIVE LOGITS
    âĶĢ
    0.77
    wards
    0.76
     GOODMAN
    0.74
    ctors
    0.72
    lier
    0.69
    Meet
    0.67
    ¯¯¯¯¯¯¯¯
    0.67
    skirts
    0.65
    ymes
    0.64
    Ñĭ
    0.64
    Act Density 0.021%

    No Known Activations