INDEX
    Explanations

    phrases introducing lists or sets of items or actions

    phrases that introduce lists or examples

    New Auto-Interp
    Negative Logits
    terness
    -0.80
    irlf
    -0.79
    osate
    -0.78
    tick
    -0.73
    slaught
    -0.72
     Canaver
    -0.72
    opus
    -0.72
    gow
    -0.72
    Fed
    -0.71
    fecture
    -0.71
    POSITIVE LOGITS
     include
    1.23
     are
    1.12
     latter
    1.11
     items
    1.11
     devices
    1.08
     types
    1.07
     entities
    1.06
     kinds
    1.05
     relate
    1.03
     aren
    1.03
    Act Density 0.120%

    No Known Activations