INDEX
    Explanations

    mentions of singing and songs

    New Auto-Interp
    Negative Logits
    -0.60
    an
    -0.56
    <bos>
    -0.54
    ;
    -0.51
    en
    -0.50
    P
    -0.50
    um
    -0.49
    ↵↵
    -0.48
    is
    -0.47
    T
    -0.47
    POSITIVE LOGITS
     فريبيس
    1.05
    Autoritní
    1.05
    تقاوى
    1.05
    脚注の使い方
    1.05
    ✨:
    0.99
     للمعارف
    0.98
     المعيارى
    0.96
    RenderAtEndOf
    0.94
    󠁣
    0.92
    expandindo
    0.90
    Act Density 0.438%

    No Known Activations