INDEX
    Explanations

    expressions of desire and choice regarding actions or preferences

    New Auto-Interp
    Negative Logits
    
    -0.50
     Zep
    -0.45
     Frag
    -0.44
    ím
    -0.42
     Zelt
    -0.41
    Count
    -0.41
    地道
    -0.40
     trate
    -0.40
    Working
    -0.40
    이는
    -0.39
    POSITIVE LOGITS
     desired
    1.08
     desire
    1.06
    desire
    0.98
    desired
    0.93
     wished
    0.86
     wish
    0.86
     desires
    0.83
     convenient
    0.80
     IndexPath
    0.80
    argout
    0.78
    Act Density 0.202%

    No Known Activations