model=AutoModel.from_pretrained("wsntxxn/cnn14rnn-tempgru-audiocaps-captioning",trust_remote_code=True).to(device)tokenizer=PreTrainedTokenizerFast.from_pretrained("wsntxxn/audiocaps-simple-tokenizer")wav,sr=torchaudio.load("/path/to/file.wav")wav=torchaudio.functional.resample(wav,sr,model.confi...
$ git clone git@github.com:audio-captioning/dacse-2020-baseline.git The above command will create the directory dacse-2020-baseline and populate it with the contents of this repository. The dacse-2020-baseline directory will be called root directory for the rest of this README file. For ins...
cdaudio_captioning/clip mkdir -p AudioCLIP/assetscdAudioCLIP/assets wget https://github.com/AndreyGuzhov/AudioCLIP/releases/download/v0.1/AudioCLIP-Full-Training.pt wget -P https://github.com/AndreyGuzhov/AudioCLIP/releases/download/v0.1/bpe_simple_vocab_16e6.txt.gz ...
Audio captioning recipe. Contribute to wsntxxn/AudioCaption development by creating an account on GitHub.
WSTAG uses audio captioning data for training. The format of training data is the same as AudioGrounding, with the only difference that there is no segments in phrase_item. You can convert the original captioning data into this format by yourself. The phrase parsing rules are provided here. ...
Here are 2 public repositories matching this topic... [NeurIPS 2023 - ML for Audio Workshop (Oral)] Zero-shot audio captioning with audio-language model guidance and audio context keywords audiozero-shotoptaudio-captioningclotho-datasetlarge-language-modelsneurips-2023audiocaps ...
machine-learningdeep-neural-networksdeep-learningsignal-processingaudio-signal-processingcaptioningdcasemachine-listeningaudio-captioningdcase2020 UpdatedAug 22, 2023 Python A repository for my MSc thesis in Data Science & Machine Learning @ NTUA. A deep learning approach to audio fingerprinting for recognizi...
Audio captioning recipe. Contribute to wsntxxn/AudioCaption development by creating an account on GitHub.
git clone https://github.com/mshukor/UnIVAL.git pip install -r requirements.txt Download the following model for captioning evaluation: python -c "from pycocoevalcap.spice.spice import Spice; tmp = Spice()" Datasets and Checkpoints Seedatasets.mdandcheckpoints.md. ...
Language-Audio Pretraining) is a model that learns acoustic concepts from natural language supervision and enables “Zero-Shot” inference. The model has been extensively evaluated in 26 audio downstream tasks achieving SoTA in several of them including classification, retrieval, and captioning. ...