INDEX
    Explanations

    author, authorship, author's

    New Auto-Interp
    Negative Logits
    이드
    0.48
    ダイ
    0.46
    0.45
    使用者
    0.44
    ت
    0.43
     사용자
    0.42
    ができる
    0.42
     conditioners
    0.42
    الح
    0.41
     utilizzo
    0.41
    POSITIVE LOGITS
     authors
    1.00
     Authors
    0.99
     author
    0.94
     Author
    0.94
    author
    0.91
     автор
    0.88
    Author
    0.86
    Authors
    0.83
    authors
    0.82
     authorship
    0.81
    Act Density 0.004%

    No Known Activations