INDEX
    Explanations

    tokens with linguistic or numeric symbols

    New Auto-Interp
    Negative Logits
     houſe
    -0.68
     fubject
    -0.67
     Jefus
    -0.66
     Chriftian
    -0.66
     Theſe
    -0.65
     Monfieur
    -0.65
     xenia
    -0.64
     ſtate
    -0.64
     ejus
    -0.62
     Italij
    -0.61
    POSITIVE LOGITS
    ôles
    0.52
    っこう
    0.51
     Ko
    0.51
     memas
    0.49
    boten
    0.47
     li
    0.47
     ten
    0.47
    lained
    0.47
     then
    0.46
    transQ
    0.46
    Act Density 0.584%

    No Known Activations