INDEX
    Explanations

    describing physical states or actions

    New Auto-Interp
    Negative Logits
    !";
    0.65
    !';
    0.63
    !;
    0.62
     !!!!
    0.56
    !!!!!
    0.55
    !!!!!!
    0.55
    !!!!
    0.54
     !!!!!
    0.51
    !".
    0.50
    !!!!!!!
    0.50
    POSITIVE LOGITS
     despite
    0.54
     amidst
    0.47
     Beside
    0.44
     beside
    0.44
     رغم
    0.42
     impatiently
    0.42
     instinctively
    0.42
     while
    0.41
    这才
    0.41
     whilst
    0.41
    Act Density 0.108%

    No Known Activations