INDEX
    Explanations

    informal expressions and colloquial language

    New Auto-Interp
    Negative Logits
    ImageContext
    -0.54
     validamos
    -0.50
     ostavi
    -0.50
     pleaſure
    -0.44
     ModelExpression
    -0.44
     مشين
    -0.41
    ตร์
    -0.41
    🟤
    -0.41
    存于互联网档案馆
    -0.40
    Архівовано
    -0.40
    POSITIVE LOGITS
    icoli
    0.46
    phor
    0.44
    stry
    0.43
    I
    0.43
    meta
    0.42
    fib
    0.42
     gotta
    0.42
    fish
    0.41
    Meta
    0.41
    hit
    0.40
    Act Density 0.010%

    No Known Activations