Sockets Tutorial with Python 3 part 2 - buffering and streaming data




Welcome to part 2 of the sockets tutorial with Python. In the previous tutorial, we learned how we could send and receive data using sockets, but then we illustrated the problem that can arise when our communication exceeds our buffer size. In this tutorial, we'll talk about overcoming this!

As mentioned before, there are a few logical ways that you could handle for this, but one common way is by starting all messages with a header that contains the length of the message that is going to come. The next challenge is normalizing this header in some way. You might consider using some series of characters, or some format, but then you run the risk of people accidentally, or purposefully, mimicking this formatting. Instead, you can go with a fixed-length header, where the first n bytes of data will be the header data, which will include the length of the message to come. Once we've received that length of data, we know any following information will be a new message, where we need to grab the header and continue repeating this process.

So what we need to do now is choose some truly maximal message size. Say, 1,000,000,000 bytes. Right, there's almost no circumstance where someone would attempt anything even close to this via our chat app, so this will be fine. That number is 10 bytes (10 chars). In python, how might we represent any number as 10 characters? We can use string formatting! Yay basics! Since this is a lesser-used functionality, see more here: format examples, which you will see examples like:


#Aligning the text and specifying a width:

>>>
>>> '{:<30}'.format('left aligned')
'left aligned                  '
>>> '{:>30}'.format('right aligned')
'                 right aligned'
>>> '{:^30}'.format('centered')
'           centered           '
>>> '{:*^30}'.format('centered')  # use '*' as a fill char
'***********centered***********'

In this case, you can see various examples where there are 30 characters used every time, but you can do various alignments. While this is mainly used to make text-based GUIs pretty, we can also use this for our purposes, like:

f'{len("your message here!"):<10}'

In the above case, this will produce the length of our message using 10 characters.

>>> f'{len("your message here!"):<10}'
'18        '

All we do now is just pre-pend all of our messages with this, then we can convert the first 10 chars of new messages to an int to know how much more is a part of a unique message. To do this, we'll start in our server script:

server.py

import socket

HEADERSIZE = 10

s = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
s.bind((socket.gethostname(), 1241))
s.listen(5)

while True:
    # now our endpoint knows about the OTHER endpoint.
    clientsocket, address = s.accept()
    print(f"Connection from {address} has been established.")

    msg = "Welcome to the server!"
    msg = f"{len(msg):<{HEADERSIZE}}"+msg

    clientsocket.send(bytes(msg,"utf-8"))

So now our messages will have a header of 10 characters/bytes that will contain the length of the message, which our client use to inform it when the end of the message is received. Let's work on the client.py next:

client.py

import socket

HEADERSIZE = 10

s = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
s.connect((socket.gethostname(), 1241))

while True:
    full_msg = ''
    new_msg = True
    while True:
        msg = s.recv(16)
        if new_msg:
            print("new msg len:",msg[:HEADERSIZE])
            msglen = int(msg[:HEADERSIZE])
            new_msg = False

        print(f"full message length: {msglen}")

        full_msg += msg.decode("utf-8")

        print(len(full_msg))


        if len(full_msg)-HEADERSIZE == msglen:
            print("full msg recvd")
            print(full_msg[HEADERSIZE:])
            new_msg = True

This one is a bit more involved, but nothing too crazy here. I increased out buffer to 16 bytes. 8 wouldnt even be enough to read the header, so that would have been a problem, and you would probably never have a buffer as small as these anyway. We're just doing it for example. So, we start off in a state where the next bit of data we get is a new_msg.

If the message is a new_msg, then the first thing we do is parse the header, which we already know is a fixed-length of 10 characters. From here, we parse the message length. Then, we continue to build the full_msg, until that var is the size of msglen + our HEADERSIZE. Once this happens, we print out the full message.

Going from this to some sort of streaming API is quite simple. Let's do an example where the server just streams out something simple, like the current time.

To do this, we just add the following to the end:

    while True:
        time.sleep(3)
        msg = f"The time is {time.time()}"
        msg = f"{len(msg):<{HEADERSIZE}}"+msg

        print(msg)

        clientsocket.send(bytes(msg,"utf-8"))

Our full server.py is now:

import socket
import time


HEADERSIZE = 10

s = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
s.bind((socket.gethostname(), 1243))
s.listen(5)

while True:
    # now our endpoint knows about the OTHER endpoint.
    clientsocket, address = s.accept()
    print(f"Connection from {address} has been established.")

    msg = "Welcome to the server!"
    msg = f"{len(msg):<{HEADERSIZE}}"+msg

    clientsocket.send(bytes(msg,"utf-8"))

    while True:
        time.sleep(3)
        msg = f"The time is {time.time()}"
        msg = f"{len(msg):<{HEADERSIZE}}"+msg

        print(msg)

        clientsocket.send(bytes(msg,"utf-8"))

Now, nothing changes for our client, except for the preparation that we will accept new messages after the first, so we need to reset the full_msg var, so our full client.py becomes:

import socket

HEADERSIZE = 10

s = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
s.connect((socket.gethostname(), 1243))

while True:
    full_msg = ''
    new_msg = True
    while True:
        msg = s.recv(16)
        if new_msg:
            print("new msg len:",msg[:HEADERSIZE])
            msglen = int(msg[:HEADERSIZE])
            new_msg = False

        print(f"full message length: {msglen}")

        full_msg += msg.decode("utf-8")

        print(len(full_msg))


        if len(full_msg)-HEADERSIZE == msglen:
            print("full msg recvd")
            print(full_msg[HEADERSIZE:])
            new_msg = True
            full_msg = ""

Now, run these two things, and you should see the server outputting something like:

28        The time is 1552068299.01783
30        The time is 1552068302.0181189
30        The time is 1552068305.0206459
29        The time is 1552068308.021842
29        The time is 1552068311.024837
29        The time is 1552068314.025016
28        The time is 1552068317.02619
29        The time is 1552068320.026504
29        The time is 1552068323.031633
30        The time is 1552068326.0359411
29        The time is 1552068329.039903
29        The time is 1552068332.040124
30        The time is 1552068335.0402749
27        The time is 1552068338.0437
29        The time is 1552068341.043971

And the client:

new msg len: b'27        '
full message length: 27
16
full message length: 27
32
full message length: 27
37
full msg recvd
The time is 1552068338.0437
new msg len: b'29        '
full message length: 29
16
full message length: 29
32
full message length: 29
39
full msg recvd
The time is 1552068341.043971

In the next tutorial, we'll be talking about how we can send Python objects rather than just strings.

The next tutorial:





  • Sockets Tutorial with Python 3 part 1 - sending and receiving data
  • Sockets Tutorial with Python 3 part 2 - buffering and streaming data
  • Sockets Tutorial with Python 3 part 3 - sending and receiving Python Objects with sockets
  • Socket Chatroom server - Creating chat application with sockets in Python
  • Socket Chatroom client - Creating chat application with sockets in Python