INDEX
    Explanations

    phrases indicating transformations or changes in circumstances

    New Auto-Interp
    Negative Logits
    uti
    -0.15
    ontent
    -0.15
    byn
    -0.15
     Truy
    -0.15
    allo
    -0.14
    omik
    -0.14
    prites
    -0.14
    elps
    -0.14
     pry
    -0.14
    à¥įतà¤ķ
    -0.14
    POSITIVE LOGITS
     nowhere
    0.48
     thin
    0.27
     nothing
    0.26
    Thin
    0.25
     blue
    0.25
     Thin
    0.24
    blue
    0.23
    thin
    0.22
    -blue
    0.21
    nothing
    0.21
    Act Density 0.019%

    No Known Activations