In this blog, I am going to breakdown the idea of broadcasting in python. Broadcasting is one of the main concepts that you should know about to get the best out of deep learning frameworks. Let's see what is it.

To start with, say you have two variables, x and y as below;

x = torch.tensor(10)
y = torch.tensor([100,200,300])

Now, let's say, you want to add x for each element in y. So we can simply do as follows;

z = x + y;z

tensor([110, 210, 310])

But, how does this actually work? x is just a scalar and we just add that into a tensor. This is done with the help of broadcasting. Let's see how it work under the hood. Say we have two tensors, a and b as below;

a = torch.tensor([1., 2, 3])
b = torch.tensor([[10.,20,30],
                  [40,50,60]])
a,b

(tensor([1., 2., 3.]),
 tensor([[10., 20., 30.],
         [40., 50., 60.]]))

a.shape, b.shape

(torch.Size([3]), torch.Size([2, 3]))

Let's see what happen if we add them up ...

c = a + b;c

tensor([[11., 22., 33.],
        [41., 52., 63.]])

c.shape

torch.Size([2, 3])

So, what actually happened is, pytorch done broadcasting on tensor a to get it to the same size as b and then done element wise multiplication. In other words, first, pytorch has expanded a to match the dimensions of b and then done the element wise multiplication as shown below;

a.expand_as(b)

tensor([[1., 2., 3.],
        [1., 2., 3.]])

See? we just replicate the tensor a to match it to the dimensions of tensor b. But, one may ask, how come this so useful if it gonna fill up the memory with the copiesof the same data right? The answer is it does not copy the same data to the memory. It only contains the original data that we given in the initialization.

a.storage()

 1.0
 2.0
 3.0
[torch.storage._TypedStorage(dtype=torch.float32, device=cpu) of size 3]

Like that, it only contains the initial data we given. Then how does it perform the broadcasting? Well, the pytorch use a neat trick with strides to copy elements in the memory to get the matching dimentions.

w = a.expand_as(b)
w, w.shape

(tensor([[1., 2., 3.],
         [1., 2., 3.]]),
 torch.Size([2, 3]))

w.stride()

(0, 1)

What does this mean? So, when we initialized tensor a at the beginning, the values 1, 2, and 3 are put into adjacent memory cells. The stride shows the way we should fill up the positions in the target dimensions. The first element refers to axis 0 and the second element refers to axis 1. So, 1 in (0,1) means, skip one memory location at a time to get to the next column whereas 0 in (0,1) means that we do not skip memory locations in dimension 0, i.e., raw wise.

So, how can we get a higher dimensional array from a lower dimensional one? Well, there are two ways to do that. The first is to use unsqueeze(dim) and the second is to index our initial tensor with [None].

a, a.shape

(tensor([1., 2., 3.]), torch.Size([3]))

a.unsqueeze(0), a[None,:]

(tensor([[1., 2., 3.]]), tensor([[1., 2., 3.]]))

a.unsqueeze(0).shape, a[None,:].shape

(torch.Size([1, 3]), torch.Size([1, 3]))

We can always skip trailing ':'s and you will see that in many cases. Furthermore, we can use '...' to imply all the preceding dimensions

a[None].shape, a

(torch.Size([1, 3]), tensor([1., 2., 3.]))

a[...,None].shape, a

(torch.Size([3, 1]), tensor([1., 2., 3.]))

As you see, we can expand the dimensions easily. The argument we pass into unsqueeze() is the position of the nex axis we want to add.

However, there are certain rules associated with tensor operations;

Two tensors are compatible to perform tensor operations if their dimensions, starting from right to left;
- equal or
- one of them is one in which case we use broadcasting to get that one to the same dimension

And that's a wrap! Hope you were able to understand the concept of broadcasting a little more intuitively.