INDEX
    Explanations
    New Auto-Interp
    Negative Logits
    LINK
    -0.07
    iring
    -0.06
     frat
    -0.06
     ferv
    -0.06
    Sessions
    -0.06
    @Table
    -0.06
     translator
    -0.06
    School
    -0.06
     States
    -0.06
    λοι
    -0.06
    POSITIVE LOGITS
    the
    0.09
    :The
    0.09
     THE
    0.09
    -the
    0.07
     khăn
    0.07
     The
    0.07
     küçük
    0.07
    THE
    0.07
    _the
    0.07
     θ
    0.07
    Act Density 0.064%

    No Known Activations