INDEX
    Explanations

    this is a defining characteristic

    New Auto-Interp
    Negative Logits
     justement
    0.76
    ujo
    0.74
     Это
    0.73
     simplemente
    0.72
    আন
    0.71
     คือ
    0.70
     exatamente
    0.69
     efectivamente
    0.68
    那就是
    0.68
     అనే
    0.68
    POSITIVE LOGITS
    visible
    0.76
     visible
    0.70
     possible
    0.69
     available
    0.67
     culminating
    0.67
     varies
    0.66
     accessible
    0.65
     commensurate
    0.65
     필요
    0.65
     oversaw
    0.63
    Act Density 0.477%

    No Known Activations