INDEX
    Explanations

    examples or instances mentioned in a text

    references to examples and instances in explanations

    New Auto-Interp
    Negative Logits
    ibly
    -0.75
    resy
    -0.72
    shaw
    -0.69
    orship
    -0.69
    rity
    -0.67
    cedented
    -0.66
    lied
    -0.65
    ses
    -0.64
    eers
    -0.63
    Pg
    -0.63
    POSITIVE LOGITS
     suppose
    1.44
     Suppose
    1.24
     imagine
    1.20
     consider
    1.11
     if
    0.98
     Imagine
    0.89
     Consider
    0.85
     compare
    0.80
     let
    0.79
     whereas
    0.78
    Act Density 0.135%

    No Known Activations