INDEX
    Explanations

    references to the concept of "more" in various contexts

    New Auto-Interp
    Negative Logits
     hroz
    -0.46
     masiva
    -0.44
     Turquía
    -0.42
     massive
    -0.41
     large
    -0.40
     huge
    -0.40
     múltiple
    -0.37
     Moscú
    -0.36
     gigantes
    -0.36
     aveug
    -0.36
    POSITIVE LOGITS
     esternos
    0.77
    もう少し
    0.73
     beetje
    0.71
    <pad>
    0.68
    <unused28>
    0.68
    <unused14>
    0.68
    <unused23>
    0.68
    <unused16>
    0.68
    [@BOS@]
    0.68
    <unused8>
    0.68
    Act Density 0.012%

    No Known Activations