INDEX
    Explanations

    phrases that indicate similarity or comparison

    New Auto-Interp
    Negative Logits
    -0.59
     OSError
    -0.52
     houſe
    -0.48
     Reſ
    -0.47
     Houſe
    -0.47
     ſhall
    -0.46
     ſtate
    -0.46
     ſche
    -0.46
     baum
    -0.46
     faſt
    -0.46
    POSITIVE LOGITS
    例えば
    0.60
     kuten
    0.59
     including
    0.58
     like
    0.58
     např
    0.57
    including
    0.57
     включая
    0.56
     Including
    0.55
    like
    0.55
    เช่น
    0.55
    Act Density 0.291%

    No Known Activations