INDEX
    Explanations
    New Auto-Interp
    Negative Logits
     same
    0.61
    same
    0.55
     selben
    0.49
     საერთ
    0.47
    同一个
    0.47
     동일
    0.46
     അതേ
    0.44
     samme
    0.44
     mesmas
    0.44
    ौनक
    0.44
    POSITIVE LOGITS
    ities
    0.86
    Similar
    0.71
    类似
    0.64
     Similar
    0.63
     similar
    0.59
    ily
    0.59
    類似
    0.57
     подобные
    0.55
     similares
    0.54
    类似的
    0.54
    Act Density 0.004%

    No Known Activations