INDEX
    Explanations

    phrases indicating lack of association or connection between different entities or concepts

    phrases emphasizing causation or relationships between actions

    New Auto-Interp
    Negative Logits
     feasibility
    -0.62
    riger
    -0.60
    esthes
    -0.59
     holiest
    -0.57
    ements
    -0.56
    cember
    -0.56
     Gust
    -0.54
    hooting
    -0.53
    aniel
    -0.53
    res
    -0.52
    POSITIVE LOGITS
     contribute
    0.76
     celebrate
    0.74
     differentiate
    0.73
    speak
    0.73
     spare
    0.73
     prove
    0.73
    "></
    0.72
     satisfy
    0.70
     settle
    0.70
    ensed
    0.69
    Act Density 0.061%

    No Known Activations