INDEX
Explanations
neighborhood-related terms
references to neighborhoods and community contexts
New Auto-Interp
Negative Logits
bearer
-0.75
isted
-0.73
Mehran
-0.67
ista
-0.66
REDACTED
-0.64
whip
-0.63
arian
-0.63
ISM
-0.63
displayText
-0.60
istic
-0.60
POSITIVE LOGITS
bors
1.44
Neigh
1.34
Neigh
1.31
bour
1.18
neighbours
1.04
neighbour
1.03
neighb
1.03
neighbors
1.01
bor
0.96
stairs
0.91
Activations Density 0.014%