INDEX
    Explanations
    New Auto-Interp
    Negative Logits
     myſelf
    -1.07
     Jefus
    -1.03
     greateſt
    -1.02
     Theſe
    -0.98
     Efq
    -0.95
     themſelves
    -0.93
     Diſ
    -0.93
    تقاوى
    -0.92
     contextLoads
    -0.92
    AddTagHelper
    -0.92
    POSITIVE LOGITS
     label
    0.57
    label
    0.52
     per
    0.49
     [
    0.47
     several
    0.45
     ,
    0.45
     Tre
    0.44
     tr
    0.44
     pro
    0.42
    Label
    0.42
    Act Density 0.228%

    No Known Activations