INDEX
    Explanations
    New Auto-Interp
    Negative Logits
    0.95
    <unused1869>
    0.91
     setbacks
    0.90
    <unused353>
    0.88
    <unused1080>
    0.87
     optimistic
    0.87
    <unused1072>
    0.84
    <unused2217>
    0.83
     وا
    0.83
    <unused323>
    0.83
    POSITIVE LOGITS
    1.13
    0.67
    하지만
    0.64
    Minimum
    0.63
     무료
    0.63
    외부
    0.62
    System
    0.62
     системи
    0.62
     좋은
    0.61
     국제
    0.61
    Act Density 0.039%

    No Known Activations