INDEX
    Explanations
    New Auto-Interp
    Negative Logits
    Tikang
    -1.03
     Anſ
    -1.02
     Theſe
    -1.01
    SOUNDBITE
    -0.98
    +#+#
    -0.97
     myſelf
    -0.96
     Diſ
    -0.94
     Beſ
    -0.93
     Majefty
    -0.92
     Limburg
    -0.92
    POSITIVE LOGITS
     Ar
    2.02
    Ar
    1.94
     ar
    1.83
    ar
    1.64
     AR
    1.48
    AR
    1.40
     Ар
    1.27
    Ар
    1.14
     Aron
    1.04
     ар
    1.00
    Act Density 0.046%

    No Known Activations