INDEX
    Explanations

    phrases that indicate work, effort, or complex interactions within narratives

    New Auto-Interp
    Negative Logits
    repid
    -0.14
     ÙĩÙħÚĨÙĨÛĮÙĨ
    -0.14
    звиÑĩай
    -0.14
    بÙĪØ§Ø³Ø·Ø©
    -0.13
    ÅĻes
    -0.13
    ÙĤÙĩ
    -0.13
    -awesome
    -0.12
    ismet
    -0.12
    ÙĪØ§Ø±
    -0.12
    оÑĢаÑı
    -0.12
    POSITIVE LOGITS
     too
    1.25
    too
    1.09
     Too
    1.03
     TOO
    1.02
    Too
    0.99
    太
    0.91
    -too
    0.88
     ÑģлиÑĪком
    0.80
     demasi
    0.80
     太
    0.75
    Act Density 0.659%

    No Known Activations