INDEX
    Explanations

    introducing abbreviations

    New Auto-Interp
    Negative Logits
     kittens
    0.87
     tinkering
    0.86
     verhindert
    0.85
     puppies
    0.85
     milkshake
    0.85
     robbing
    0.83
     kitten
    0.82
     puppy
    0.82
     kada
    0.82
     kneading
    0.82
    POSITIVE LOGITS
     hereinafter
    2.37
    hereafter
    2.35
    hereinafter
    2.32
    referred
    2.03
     hereafter
    2.01
     henceforth
    1.75
    简称
    1.61
     referred
    1.56
     herein
    1.48
    hence
    1.43
    Act Density 0.095%

    No Known Activations