INDEX
    Explanations

    instances of emotional expressions and subjective evaluations

    New Auto-Interp
    Negative Logits
     dis
    -0.34
    ])->
    -0.31
    ))->
    -0.30
    )=>{
    
    -0.29
     τῆς
    -0.29
    僕が
    -0.29
    Tembelea
    -0.29
    それに
    -0.28
    )]=
    -0.28
     foreign
    -0.28
    POSITIVE LOGITS
     kasarigan
    0.60
     好文分享
    0.57
     パンチラ
    0.56
     yaiba
    0.56
    <unused68>
    0.55
    <unused41>
    0.55
    Бахар
    0.55
     Dieſe
    0.55
    <pad>
    0.54
    <unused17>
    0.54
    Act Density 1.006%

    No Known Activations