INDEX
    Explanations

    Non-English characters

    New Auto-Interp
    Negative Logits
     loving
    -0.07
    -0.07
     anticipated
    -0.07
    arians
    -0.06
    travel
    -0.06
     вра
    -0.06
     fault
    -0.06
    605
    -0.06
    Division
    -0.06
    (un
    -0.06
    POSITIVE LOGITS
     фев
    0.06
     предназнач
    0.06
    γρα
    0.06
     Jenner
    0.06
    	opts
    0.06
    ğ
    0.06
    0.06
     Humph
    0.06
     Při
    0.06
     \%
    0.06
    Act Density 0.107%

    No Known Activations