Our application records audio with the press of a button — It is important to enable it immediately during the question. Next, use OpenAI’s Whisper model to transcribe the audio from the audio. After this, ask ChatGPT to create an answer to your question and display it on your screen. This process takes some time, so you can answer the interviewer’s questions with little hesitation.
There are a few notes before you begin.
- We intentionally used off-the-shelf APIs so that the solution does not require many resources and can run even on low-power laptops.
- I tested the functionality of the application only on Linux. You may need to modify your audio recording library or install additional drivers for other platforms.
- All code is in the GitHub repository.
What are we delaying? let’s start!
Because we aim to develop applications that operate independently of the platforms used for calls, such as Google Meet, Zoom, and Skype, we cannot take advantage of the APIs of these applications. therefore, Audio must be captured directly to your computer.
It is important to note that the audio stream is not recorded through the microphone, but rather the audio stream coming from the speakers. After a little googling I found the sound card library. The author claims that it is cross-platform, so you shouldn’t have any problems.
The only drawback for me was that I had to explicitly pass in the range of time the recording would take. However, this problem is solvable because the recording function returns audio data in a concatenable numpy array.
So a simple code to record an audio stream through a speaker would be:
import soundcard as scRECORD_SEC = 5
SAMPLE_RATE = 48000
with sc.get_microphone(
id=str(sc.default_speaker().name),
include_loopback=True,
).recorder(samplerate=SAMPLE_RATE) as mic:
audio_data = mic.record(numframes=SAMPLE_RATE * RECORD_SEC)
then it .wav
Files using the sound file library:
import soundfile as sfsf.write(file="out.wav", data=audio_data, samplerate=SAMPLE_RATE)
here You will find codes related to audio recording.
This step is relatively easy as it leverages OpenAI’s pre-trained Whisper model API.
import openaidef transcribe_audio(path_to_file: str = "out.wav") -> str:
with open(path_to_file, "rb") as audio_file:
transcript = openai.Audio.translate("whisper-1", audio_file)
return transcript["text"]
During testing, we found the quality to be excellent. I chose this model because it was quick to transcribe recorded audio. Additionally, this model can handle languages other than English, so you can conduct interviews in any language.
If you don’t want to use the API, you can run it locally. We recommend using Whisper.cpp. This is a high-performance solution that does not consume many resources (author of the library) I ran the model on an iPhone 13 device).
here Find the Whisper API documentation.
Use ChatGPT to generate answers to interviewer questions. Although using the API seems like a simple task, there are two additional issues that need to be resolved.
- Improve the quality of your transcripts — Transcripts aren’t always perfect. Maybe the interviewer had trouble hearing, or maybe he was a little slow in pressing the record button.
- Speed up text generation — It’s important to receive answers as quickly as possible to keep the conversation flowing and prevent questions.
1. Improve transcript quality
You can manage this by clearly specifying in the system prompt that you are using a potentially incomplete audio transcription.
SYSTEM_PROMPT = """You are interviewing for a INTERVIEW_POSTION position.
You will receive an audio transcription of the question.
Your task is to understand question and write an answer to it."""
2. Faster text generation
To accelerate the generation, we Make two requests at the same time to ChatGPT. This concept is very similar to the approach outlined in the Skeleton-of-Thought article and is visually represented below.
The first request will generate a quick response consisting of 70 words or less. This allows you to continue the interview without any awkward interruptions.
QUICK = "Concisely respond, limiting your answer to 70 words."
The second request returns a more detailed answer. This is necessary to support further engagement in the conversation.
FULL = """Before answering, take a deep breath and think step by step.
Your answer should not exceed more than 150 words."""
It is worth noting that the prompt adopts the following structure:Take a deep breath and think about it one step at a time.”, a method that has been proven in recent research to provide superior response quality.
here You will find code related to the ChatGPT API.
To visualize the response from ChatGPT, we need to create a simple GUI application. After considering several frameworks, I settled on PySimpleGUI. This allows you to easily create her GUI applications with a complete set of widgets. Additionally, the following features are required:
- Rapid prototyping.
- Native support for running long functions in separate threads.
- Keyboard button controls.
Below is an example code that creates a simple application that sends requests to the OpenAI API in a separate thread using: perfrom_long_peration
:
import PySimpleGUI as sgsg.theme("DarkAmber")
chat_gpt_answer = sg.Text( # we will update text later
"",
size=(60, 10),
background_color=sg.theme_background_color(),
text_color="white",
)
layout = [
[sg.Text("Press A to analyze the recording")],
[chat_gpt_answer],
[sg.Button("Cancel")],
]
WINDOW = sg.Window("Keyboard Test", layout, return_keyboard_events=True, use_default_focus=False)
while True:
event, values = WINDOW.read()
if event in ["Cancel", sg.WIN_CLOSED]:
break
elif event in ("a", "A"): # Press A --> analyze
chat_gpt_answer.update("Making a call to ChatGPT..")
WINDOW.perform_long_operation(
lambda: generate_answer("Tell me a joke about interviewing"),
"-CHAT_GPT ANSWER-",
)
elif event == "-CHAT_GPT ANSWER-":
chat_gpt_answer.update(values["-CHAT_GPT ANSWER-"])
here You will find code related to GUI applications.
Now that you’ve looked at all the necessary components, it’s time to assemble the application. We detailed capturing audio, converting it to text using Whisper, and generating responses using ChatGPT. We also covered creating a simple GUI.
Now let’s watch a demo video that shows how all these components work together. This will give you a clearer understanding of how you can use or modify this application to suit your needs.
A demonstration is shown below.
If you want to develop and enhance this solution, here are some improvements you can make:
- Accelerate responses from LLM: To do this, we leverage open source models with fewer parameters, such as LlaMA-2 13B, to reduce response times.
- Use NVIDIA Broadcast. This model adds eye contact and appears to be looking at the camera at all times. In this case, the interviewer will not be aware that you are reading the answer.
- Create a browser extension: This is especially useful if you are asked to perform live coding during the interview. In such cases, just highlight the question/task.
We have been on an interesting journey to see how artificial intelligence, especially Whisper and ChatGPT, can become useful assistants during job interviews. This application is a glimpse into the future, showing how technology can seamlessly integrate into our daily routines and make our lives a little bit easier. However, the purpose of this app is to explore and understand the potential of generative AI, and it is important to use such technology ethically and responsibly.
Finally, the possibilities with AI are truly endless and exciting. This prototype is just a stepping stone, there is still much to explore. For those who want to delve deeper into the world, the door is open, and you never know what great innovations may be waiting around the corner.