INDEX
    Explanations

    references to color combinations and their descriptions

    New Auto-Interp
    Negative Logits
    adir
    -0.15
    kud
    -0.15
    quirrel
    -0.14
    -alist
    -0.14
    illez
    -0.14
    owell
    -0.13
    orney
    -0.13
    adla
    -0.13
    há
    -0.13
    ابر
    -0.13
    POSITIVE LOGITS
     white
    0.51
     green
    0.48
     yellow
    0.48
     blue
    0.45
     black
    0.40
     orange
    0.39
    white
    0.38
     brown
    0.38
     red
    0.38
    green
    0.34
    Act Density 0.257%

    No Known Activations