INDEX
    Explanations

    phrases indicating the concept of "opposites" or contrasting ideas

    New Auto-Interp
    Negative Logits
    lac
    -0.16
     adipiscing
    -0.16
    IDD
    -0.16
    lings
    -0.15
    lin
    -0.15
    self
    -0.14
    istics
    -0.14
    ling
    -0.14
    lis
    -0.14
    iri
    -0.14
    POSITIVE LOGITS
    -sex
    0.20
    /op
    0.19
     extremes
    0.18
     extreme
    0.17
    veau
    0.17
     nhau
    0.17
     effect
    0.17
     direction
    0.16
    .Toolkit
    0.16
    -direction
    0.15
    Act Density 0.021%

    No Known Activations