INDEX
    Explanations

    specific numeric and technical formatting, particularly in a programming or mathematical context

    New Auto-Interp
    Negative Logits
    /Gate
    -0.14
    ¶Į
    -0.14
    ë³
    -0.14
    unn
    -0.14
    ÏĥÏĩ
    -0.13
    .tem
    -0.13
     ÐŁÐ¾Ðº
    -0.13
    erp
    -0.13
    ppelin
    -0.13
    ------+------+
    -0.13
    POSITIVE LOGITS
    onta
    0.16
    cede
    0.16
    æį·
    0.15
     Carnegie
    0.14
    onia
    0.14
     spoilers
    0.13
    reator
    0.13
    ancel
    0.13
     Habit
    0.13
     Suc
    0.13
    Act Density 0.065%

    No Known Activations