INDEX
    Explanations
    New Auto-Interp
    Negative Logits
     pleaſure
    -0.61
     myſelf
    -0.55
     Jefus
    -0.54
    حياتها
    -0.52
    Géographie
    -0.51
    acies
    -0.51
    ophilus
    -0.51
    kuuta
    -0.50
    localctx
    -0.50
     faſt
    -0.50
    POSITIVE LOGITS
     CreateTagHelper
    0.79
    +#+
    0.72
    enderror
    0.68
     none
    0.65
    +:+
    0.63
     NONE
    0.62
     שוליים
    0.61
    NONE
    0.61
    Dado
    0.61
     whatsoever
    0.60
    Act Density 0.045%

    No Known Activations