INDEX
    Explanations

    words that start a sentence

    New Auto-Interp
    Negative Logits
     codigo
    1.08
     rekind
    0.99
     flagship
    0.98
    費用
    0.96
     crushed
    0.95
     dampened
    0.94
     crushing
    0.94
    0.94
     urethra
    0.93
     lapar
    0.92
    POSITIVE LOGITS
    Когда
    0.96
    evil
    0.95
    خ
    0.92
    fulness
    0.92
    ت
    0.91
    Symptoms
    0.90
    synthetic
    0.90
    Byte
    0.90
     говоря
    0.89
    temper
    0.89
    Act Density 0.001%

    No Known Activations