INDEX
    Explanations

    "all of these"

    New Auto-Interp
    Negative Logits
    BaseActivity
    -0.63
    cipline
    -0.54
    dafx
    -0.54
    ")){
    
    -0.53
    ITECT
    -0.53
    itism
    -0.52
    "){
    
    -0.52
    ברס
    -0.52
    ;">
    
    -0.52
    FOOTNOTES
    -0.52
    POSITIVE LOGITS
    :✨
    0.71
     things
    0.69
    Things
    0.57
     THINGS
    0.54
     require
    0.54
     πρά
    0.52
     "..\..\..\
    0.52
     translate
    0.51
     مشين
    0.51
     bezeichneter
    0.50
    Act Density 0.002%

    No Known Activations