INDEX
    Explanations

    mentions of intense or extreme situations, often negative

    New Auto-Interp
    Negative Logits
     overlook
    -0.71
     favor
    -0.66
     scheduled
    -0.65
     shop
    -0.65
     lookout
    -0.64
     derby
    -0.64
     disapprove
    -0.64
     termin
    -0.64
     tro
    -0.63
     adjud
    -0.63
    POSITIVE LOGITS
    They
    1.27
    We
    1.25
    Our
    1.15
    It
    1.13
    I
    1.12
    There
    1.12
    Where
    1.12
    Sometimes
    1.10
    Too
    1.10
    Because
    1.09
    Act Density 0.103%

    No Known Activations