INDEX
    Explanations

    references to specific individuals and their actions in a narrative context

    New Auto-Interp
    Negative Logits
      
    -0.43
    -0.39
    ;
    -0.38
    非常的
    -0.34
     and
    -0.34
     include
    -0.34
     illetve
    -0.32
     primarily
    -0.32
    ,
    -0.32
     considerable
    -0.30
    POSITIVE LOGITS
     Normdatei
    1.15
    +#+#
    1.09
    <unused52>
    1.08
    sizeCache
    1.08
    [@BOS@]
    1.08
    <pad>
    1.08
    <unused14>
    1.07
    <unused16>
    1.07
    <unused8>
    1.07
    <unused3>
    1.07
    Act Density 0.075%

    No Known Activations