INDEX
    Explanations

    abstract concepts and multilingual words

    New Auto-Interp
    Negative Logits
     peer
    0.44
     harassment
    0.42
    হিংস
    0.42
     cravings
    0.42
    peer
    0.41
     khuyến
    0.40
    ziná
    0.40
    laug
    0.40
     cies
    0.40
     токси
    0.39
    POSITIVE LOGITS
     சிறிது
    0.43
     свето
    0.42
     временем
    0.42
     shades
    0.41
    ))-
    0.40
     времени
    0.39
     cama
    0.39
     Tiempo
    0.39
    0.38
    三年
    0.38
    Act Density 0.002%

    No Known Activations