INDEX
    Explanations

    questions or exclamations

    New Auto-Interp
    Negative Logits
    .
    -0.39
    :
    -0.31
    ことです
    -0.31
     nemlig
    -0.30
     biztos
    -0.30
     loading
    -0.28
     Loading
    -0.28
    -0.28
    Loading
    -0.28
    -0.28
    POSITIVE LOGITS
    ?"
    2.06
    ?”
    1.98
    ?’
    1.97
    ?)
    1.96
    ?'
    1.95
    ?]
    1.91
    1.90
    ?),
    1.80
    ?).
    1.80
    ?",
    1.80
    Act Density 0.094%

    No Known Activations