INDEX
    Explanations

    phrases related to information dissemination or updates

    phrases indicating familiarity or awareness about ongoing topics or discussions

    New Auto-Interp
    Negative Logits
    Dialogue
    -0.65
    grades
    -0.61
     prolong
    -0.59
     sacrific
    -0.58
     openness
    -0.57
     preserves
    -0.56
     stunts
    -0.56
     differentiation
    -0.55
     downgrade
    -0.55
     stabilization
    -0.55
    POSITIVE LOGITS
     guessed
    1.14
     noticed
    1.13
     familiar
    1.12
     heard
    1.06
     know
    0.95
     watched
    0.93
    know
    0.92
     acquainted
    0.91
     remember
    0.88
     already
    0.88
    Act Density 0.249%

    No Known Activations