Tejas Mandre | Blog

Yield keyword in python - Real life use case

By in software

How does the yield keyword work?

One of the rarely spoken keywords in python is the "yield" keyword. It returns a generator object from the function it is used in. 

What is a generator? It is an entity that gets calculated on demand. For eg: Addition of two numbers. "2 + 3" this only gets computed when needed in the code during the run time.

def add():
    yield 2 + 3

result = add()

print(result)   # prints <generator_object_at_xxxxx>

for i in r:
    print(i)   # prints 5

In the code example above the calculation is done only when the generator object is evaluated by running it in a loop. This can also be used while constructing lists to compute the lists lazily.

To understand this better lets look at an analogy. Consider you are watching a photo album on your TV. You have two choices: Put it in slide show mode and the pictures keep on changing automatically or you press a button on the remote to change the picture. The second option is exactly what generators are. Lets look at another code example.

def my_function():
    for i in range(10):
        yield i

result = my_function()

print(next(result))   # prints 0
print(next(result))   # prints 1
print(next(result))   # prints 2
print(next(result))   # prints 3

The "next()"  call is just like pressing the button on the remote to view the next photo in the album. In other words "computing" on demand.

Exercise: What happens after calling "next()" 10 times consecutively ?

Usecase: How did I use this at work?

I had to write a celery task that ran every 30 mins and triggered a Jenkins job and then tracked its status. The Jenkins job took 3 hours to complete on an average. Following are steps in the task.

  • Trigger a Jenkins job
  • Wait until completion (~ 3 hours)
  • Read the artifact and store in the database

Ref to get an overview of celery: What is celery? (Not a must read for this blog.)

This worked fine until an issue occurred. A task was in the middle of the execution on the celery worker and the worker crashed. After the restart, the task was gone. Neither in the task queue nor in the memory of the worker. The Jenkins job however was still running at its last stage. This way the compute expensive ~3 hour long Jenkins job would go waste as the worker would never be able to get back its status and artifact to be stored in the database. The instant solution that occured to me was to have a table that stored the in progress builds with the build URL of the Jenkins job in the database. So that the Jenkins job would not be wasted even across the worker restarts. Post each job completion the URL would be deleted from the database. In case of the worker crash during the execution we would still have the URL in the database that can be looked at later.

Coming to the Jenkins trigger job function it looked something like this:

def trigger_job():
    build_url  = call_jenkins_api()
    while True:
        status = poll_build_url_for_status()
        if status != "IN_PROGRESS":
            return status
    

In order to implement the above solution we somehow needed the build_url instantly so that we could store it in the DB right at the start of the execution and then wait for the status to be returned. In order to achieve I modified the above function to look like this:

def trigger_job():
    build_url  = call_jenkins_api()
    yield build_url
    while True:
        status = poll_build_url_for_status()
        if status != "IN_PROGRESS":
            yield status

This returned the build_url inside the celery task instantly. This build url could now be stored in the DB. Further it blocked the exection of the task until the final status was returned from the function. The celery task now looked like this:

def celery_task():
    result  = trigger_job()
    build_url = next(result) # build_url is obtained instantly
    # store the build url in the db
    status = next(result) # status is obtained after 3 hours

"yield" was indeed the saviour for a critical quick hot fix that stopped the compute expensive Jenkins jobs being wasted and save on the overall computing resources.

Of course, there are better solutions to this problem than the above mentioned but I just wanted to highlight the use of the yield keyword. The code samples are just illustrations and the actual use case was more complex than this one.

Thanks for reading!

- Tejas