INDEX
Explanations
phrases related to proximity or closeness
phrases that indicate proximity or spatial relationships
New Auto-Interp
Negative Logits
FORE
-0.81
ANS
-0.76
rey
-0.74
enders
-0.68
uct
-0.67
ense
-0.66
gae
-0.65
Meta
-0.65
ulous
-0.63
hyde
-0.62
POSITIVE LOGITS
bounds
1.18
spitting
1.12
reach
1.08
sight
0.95
range
0.95
touching
0.87
limits
0.86
inches
0.85
grasp
0.85
striking
0.82
Activations Density 0.049%