INDEX
    Explanations

    phrases expressing types or categories

    New Auto-Interp
    Negative Logits
    htë
    -0.45
    васто
    -0.44
     سد
    -0.43
     spalle
    -0.42
    douard
    -0.42
    MockBean
    -0.41
    Cardiff
    -0.40
    utches
    -0.40
    lava
    -0.40
    igraf
    -0.40
    POSITIVE LOGITS
     kind
    1.82
    kind
    1.74
    Kind
    1.68
     KIND
    1.68
     Kind
    1.61
    KIND
    1.49
    kinds
    1.43
     kinds
    1.35
     Kinds
    1.23
     sort
    1.18
    Act Density 0.085%

    No Known Activations