INDEX
    Explanations

    references to specific cities, particularly New York and Los Angeles

    New York, Los Angeles, San Francisco

    New Auto-Interp
    Negative Logits
    RegressionTest
    -0.54
    تقاوى
    -0.53
    SharedDtor
    -0.47
     betweenstory
    -0.45
     Brandon
    -0.43
    Geplaatst
    -0.42
    balleur
    -0.41
    -0.40
     ویکی‌پدی
    -0.40
     RELIEF
    -0.39
    POSITIVE LOGITS
    +#+
    0.62
     ſtate
    0.52
     Majefty
    0.51
    AutoScaleMode
    0.50
     uſed
    0.47
     Cæsar
    0.47
    ニューヨーク
    0.46
    AnimationsModule
    0.45
     useStyles
    0.45
    ifflin
    0.45
    Act Density 0.013%

    No Known Activations