The document discusses using convolutional neural networks for speech activity detection from audio signals. Speech activity detection aims to distinguish between speech and noise segments to identify start and end times of speech. The approach uses spectrograms of audio signals as input to a CNN model. The CNN model was trained on a large Brazilian Portuguese dataset containing over 300 hours of speech and noise data. Evaluation on the QUT-NOISE-TIMIT dataset showed the CNN model achieved a half-total error rate of 3.2%, outperforming traditional MFCC-GMM and energy threshold baselines.