Introduction

Powerdrill has extended its capabilities to interpret and generate images and audio. Now, you can use Powerdrill to:

  • Understand images: You can upload images and then query Powerdrill for insights, uncovering hidden details within your visuals.

  • Generate images: Describe the picture in your mind in words and have Powerdrill bring it to life visually.

  • Convert text to audio: Input your text and let Powerdrill transform it into synthesized speech.

  • Convert audio to text: Upload audio files and obtain accurate transcriptions or engage in discussions based on the content.

These capabilities can be combined creatively. For instance, you can upload an audio file and instruct Powerdrill to create an accompanying image, opening up diverse possibilities for exploration.

Limitations

  • In a single message, you can upload up to 10 images.

  • Each audio file you upload must be no longer than 5 minutes in duration.

  • Text inputs for conversion to audio must not exceed 5000 characters.

How to use this feature

Prerequisites

You have upgraded your pricing plan to Basic or higher.

For Image-to-Text (understand images) functionality, you must upgrade to at least the Basic plan. To access Text-to-Image (T2I), Speech-to-Text (STT), or Text-to-Speech (TTS), upgrade to the Pro plan. For comprehensive information on pricing plans, see the Pricing page.

Understand images or audio

  1. Sign in to Powerdrill.

  2. Click the Data Exploration option in the center of the page, click the relevant icon, and upload your images or audio.

  3. After your images or audio are uploaded, enter your prompt and click Send.

  4. Based on the response of Powerdrill, you can continue your chat with follow-up questions.

Generate pictures or audio

  1. Sign in to Powerdrill.

  2. Click the Data Exploration option in the center of the page, write down your prompt and click Send.

  3. If the generated content is not that good, you can write prompt to ask Powerdrill to improve.

Watch a demo

Speech-to-Text and Understand Images:

Text-to-Speech and Image Generation: