INDEX
    Explanations
    New Auto-Interp
    Negative Logits
     Jura
    0.90
    histo
    0.90
     crypto
    0.90
     Minneapolis
    0.89
     Prairie
    0.85
     Drake
    0.84
     Crypto
    0.84
     Sinatra
    0.84
    ‌و
    0.83
    Prairie
    0.83
    POSITIVE LOGITS
    {\
    2.26
     {\
    2.26
    }{\
    2.05
    ]{\
    1.74
    ){\
    1.72
    ","
    1.67
    ,{\
    1.59
    )}{\
    1.56
    }}{\
    1.56
     ${\
    1.52
    Act Density 0.206%

    No Known Activations