INDEX
    Explanations

    the word "phrase" followed by an intense activation value

    recurring phrases or sentence structures

    New Auto-Interp
    Negative Logits
    DERR
    -0.84
    ÄŁ
    -0.83
    ©¶æ¥µ
    -0.77
     Thro
    -0.73
     Skydragon
    -0.70
    Adds
    -0.68
    fman
    -0.68
    Ka
    -0.66
    hari
    -0.65
    ntil
    -0.63
    POSITIVE LOGITS
    phrase
    1.07
    ology
    1.06
     phrases
    1.03
     phrase
    1.02
     uttered
    0.87
    terday
    0.86
    witz
    0.82
    mith
    0.77
    atre
    0.74
    eting
    0.74
    Act Density 0.016%

    No Known Activations