INDEX
Explanations
references to or mentions of neighbors
references to neighbors or neighboring relationships
New Auto-Interp
Negative Logits
uggage
-0.82
othal
-0.77
indal
-0.76
enei
-0.76
obin
-0.75
inen
-0.75
anwhile
-0.73
itech
-0.70
aneers
-0.69
xit
-0.68
POSITIVE LOGITS
neighbor
1.09
Neigh
1.07
neighbors
0.98
neighbour
0.94
liness
0.92
bors
0.91
folk
0.87
neighbours
0.86
Neighbor
0.84
hood
0.81
Activations Density 0.034%