INDEX
    Explanations

    phrases that categorize or describe types or kinds of things

    New Auto-Interp
    Negative Logits
    rotum
    -0.86
    Дереккөздер
    -0.80
     pleaſure
    -0.80
     myſelf
    -0.80
     houſe
    -0.76
     ſta
    -0.74
    ViewFeatures
    -0.73
    +#+#
    -0.73
     reaſon
    -0.72
     Roach
    -0.71
    POSITIVE LOGITS
     KIND
    1.10
     sort
    1.06
     kind
    1.06
     Kind
    1.04
     sorta
    1.00
    KIND
    0.97
     SORT
    0.95
     Sort
    0.94
    kind
    0.94
    Kind
    0.91
    Act Density 0.097%

    No Known Activations