INDEX
    Explanations

    novel and specific items

    New Auto-Interp
    Negative Logits
    0.40
    например
    0.39
    analysis
    0.39
    例えば
    0.39
    例如
    0.37
    over
    0.36
     например
    0.36
    hosting
    0.35
    resources
    0.35
    some
    0.35
    POSITIVE LOGITS
     bicycles
    0.50
     cookbooks
    0.49
     perros
    0.48
     bicycle
    0.48
     poultry
    0.48
     guitar
    0.47
     cookware
    0.46
    0.45
     guitars
    0.45
     haircuts
    0.43
    Act Density 0.016%

    No Known Activations