INDEX
    Explanations

    references to religious concepts and figures

    New Auto-Interp
    Negative Logits
    僕は
    -0.47
    僕が
    -0.44
     dios
    -0.42
     god
    -0.40
     anda
    -0.39
    僕の
    -0.38
    -0.38
     deines
    -0.37
     dieu
    -0.35
    僕も
    -0.35
    POSITIVE LOGITS
     His
    2.47
    His
    1.98
     Его
    1.64
     Himself
    1.63
    1.59
     He
    1.59
     Him
    1.49
     Jego
    1.34
     Zijn
    1.29
    He
    1.16
    Act Density 0.460%

    No Known Activations