INDEX
    Explanations

    classification

    New Auto-Interp
    Negative Logits
     развития
    -0.07
    Pont
    -0.07
    _geom
    -0.07
     інтерес
    -0.07
     ремон
    -0.07
    -[
    -0.07
    nicas
    -0.07
     τρό
    -0.07
     hic
    -0.07
     bikini
    -0.06
    POSITIVE LOGITS
     classifiers
    0.10
     classifier
    0.08
    Classifier
    0.08
     Lesser
    0.07
     interchange
    0.06
    .intersection
    0.06
     initialise
    0.06
    μαι
    0.06
    ражд
    0.06
    _classifier
    0.06
    Act Density 0.005%

    No Known Activations