INDEX
    Explanations

    phrases related to contrasting or opposing ideas

    New Auto-Interp
    Negative Logits
    .",
    -0.90
    `.
    -0.72
    ``
    -0.67
     mathemat
    -0.66
    .''.
    -0.62
    ''.
    -0.62
    inav
    -0.59
     `
    -0.59
     streng
    -0.56
    Anth
    -0.56
    POSITIVE LOGITS
    )—
    1.97
    1.69
    1.44
    )
    1.29
    !)
    1.28
    )--
    1.25
    1.25
     âĢķ
    1.21
     --
    1.20
     )
    1.19
    Act Density 0.634%

    No Known Activations