INDEX
    Explanations

    beautiful and positive adjectives

    New Auto-Interp
    Negative Logits
    有问题
    0.42
     theirs
    0.39
    athyroid
    0.39
     여기가
    0.39
    0.38
    темати
    0.37
     கிடைத்தது
    0.37
     Removed
    0.37
     этого
    0.36
    0.36
    POSITIVE LOGITS
     صفحات
    0.52
     páginas
    0.49
     fotos
    0.49
    !!”
    0.45
     photographs
    0.45
     foto
    0.44
    歌曲
    0.44
     pages
    0.43
     पृष्ठ
    0.42
     paintings
    0.42
    Act Density 0.000%

    No Known Activations