INDEX
Explanations
mentions of neighboring individuals or actions related to being a neighbor
references to neighbors
New Auto-Interp
Negative Logits
inen
-0.88
anche
-0.85
othal
-0.79
nesota
-0.78
oola
-0.76
otine
-0.76
uggage
-0.75
ulum
-0.75
indal
-0.72
emia
-0.72
POSITIVE LOGITS
neighbor
1.12
neighbors
1.11
Neigh
1.08
neighbours
1.04
neighbour
1.04
Neighbor
0.87
bors
0.83
liness
0.80
Neigh
0.79
folk
0.77
Activations Density 0.018%