INDEX
Explanations
words related to physical barriers or obstacles, especially fences
references to fences
New Auto-Interp
Negative Logits
nant
-0.72
forth
-0.67
occ
-0.64
practice
-0.63
esta
-0.63
alg
-0.62
Nir
-0.62
ounces
-0.62
arin
-0.60
)=(
-0.60
POSITIVE LOGITS
fence
1.50
fences
1.24
fencing
1.08
-+-+
0.82
encl
0.81
este
0.80
vine
0.78
yard
0.76
perimeter
0.75
gate
0.74
Activations Density 0.007%