INDEX
    Explanations
    New Auto-Interp
    Negative Logits
     βρί
    -0.07
     QText
    -0.07
     peş
    -0.07
    fuck
    -0.07
     "->
    -0.07
    -0.07
     sophistic
    -0.07
     "\(
    -0.07
     غذایی
    -0.07
     [#
    -0.06
    POSITIVE LOGITS
     Stalin
    0.06
     Future
    0.06
    ’on
    0.06
     gauge
    0.06
    urity
    0.06
     нез
    0.06
    ignon
    0.06
     competed
    0.06
     Regulation
    0.06
    (fout
    0.06
    Act Density 0.041%

    No Known Activations