INDEX
    Explanations

    end of sentence or phrase

    New Auto-Interp
    Negative Logits
     The
    -1.55
     Therefore
    -1.45
    🙄
    -1.40
    -1.39
     Similarly
    -1.38
     Arquivado
    -1.36
    ĭ
    -1.34
    😄
    -1.33
     (
    -1.31
    😲
    -1.31
    POSITIVE LOGITS
    ↵↵
    3.67
    <eos>
    1.81
    ")
    
    1.55
    خخ
    1.50
     definitely
    1.47
     will
    1.41
    ↵↵↵
    1.39
     would
    1.38
    Zunanje
    1.31
    !");
    1.27
    Act Density 0.067%

    No Known Activations