INDEX
    Explanations

    instances of proper nouns or names

    New Auto-Interp
    Negative Logits
    -0.62
     a
    -0.56
     that
    -0.53
    <eos>
    -0.52
     with
    -0.51
     or
    -0.48
     on
    -0.48
     in
    -0.47
     from
    -0.47
    :
    -0.46
    POSITIVE LOGITS
    MessageOf
    0.93
     Efq
    0.93
    ]--;
    0.87
    tagHelperRunner
    0.84
     Meksiku
    0.83
     للمعارف
    0.83
     Italijanski
    0.83
    (!__
    0.79
     iſt
    0.79
    aarrggbb
    0.77
    Act Density 0.196%

    No Known Activations