INDEX
    Explanations

    visual/sensory descriptions

    New Auto-Interp
    Negative Logits
    uders
    0.54
    pieces
    0.52
    ম্প
    0.51
    ður
    0.51
    科技有限公司
    0.51
    еди
    0.50
     otherwise
    0.50
     technology
    0.50
     behold
    0.49
     subunits
    0.49
    POSITIVE LOGITS
     tango
    0.80
     leopard
    0.79
     bustle
    0.77
     optimism
    0.75
     knitted
    0.75
     danced
    0.75
     sourire
    0.75
     rosé
    0.74
     waitress
    0.73
     grapefruit
    0.73
    Act Density 0.120%

    No Known Activations