INDEX
Explanations
the word "nearest"
the concept of proximity or nearness
New Auto-Interp
Negative Logits
anity
-0.81
inse
-0.79
raft
-0.74
zinski
-0.72
skill
-0.72
eri
-0.72
era
-0.70
detail
-0.70
ocker
-0.69
ulet
-0.68
POSITIVE LOGITS
nearest
1.17
closest
0.87
neighb
0.84
neighbour
0.84
nearer
0.83
terness
0.81
Ĭ±
0.80
oreal
0.79
ĸļ
0.78
fart
0.76
Activations Density 0.005%