INDEX
    Explanations

    short phrases that introduce or summarize information

    phrases that indicate summaries or overviews

    New Auto-Interp
    Negative Logits
    anism
    -0.80
    ['
    -0.75
    acid
    -0.71
    same
    -0.71
    cair
    -0.70
    agents
    -0.69
     eg
    -0.68
    hes
    -0.68
    evidence
    -0.67
    alties
    -0.67
    POSITIVE LOGITS
     couple
    1.15
     few
    1.10
     glimpse
    1.09
     bunch
    1.07
     lot
    1.06
     handful
    1.02
    cknowled
    1.02
     slew
    0.98
     plethora
    0.95
     snippet
    0.94
    Act Density 0.368%

    No Known Activations