INDEX
    Explanations

    guide-like content offering assistance or information

    phrases indicating guidance or instructions

    New Auto-Interp
    Negative Logits
     boycot
    -0.69
     disliked
    -0.69
     liking
    -0.69
    cakes
    -0.65
     wearing
    -0.65
    ween
    -0.65
     kisses
    -0.64
     stealing
    -0.64
    Offline
    -0.63
     staged
    -0.62
    POSITIVE LOGITS
     summarize
    1.50
     eluc
    1.42
     summar
    1.36
     explain
    1.26
     enlight
    1.25
     clarify
    1.25
     illustrate
    1.25
     outline
    1.22
     summarizes
    1.21
     illuminate
    1.19
    Act Density 0.250%

    No Known Activations