12/31/2023 0 Comments Nn sequential grayscaleEach single element of your training set is therefore 128 features by 398709 time steps. Instead, think of them as vectors of 128 features arranged in time sequence. Images are inherently two dimensional, with structure along both dimensions. To get some clarity, stop calling these things images. If this is right, then you running a sequence of “images” through an RNN to classify the sequence, not treating the pixels of a single image as a sequence. It looks like you want to classify time sequences (length 398709) made of elements, each of which you are calling a image. (You could in theory apply an RNN to the frequency dimension but there is not much of a cause-and-effect relationship, and there are more effective ways to extract its features.) The RNN learns the causal relationship in its hidden state and gives you a conclusion -a prediction or a class- after seeing the whole sequence. The essence is that what has gone before is the cause of what comes after. RNNs are typically applied to a time dimension. Please excuse me if I make any wrong assumptions about what you already understand. I think it’s important that you have a clear concept of what you are trying to do. Thomas’s code gives you (and me) some great examples and starting points. for a CNN -> ConvLSTM/GRU approach (also transformer, Ph圜ell and seq2seq) for a video classification using CNN -> LSTM approach Have been playing with this type of problems for a while, check: Then you could feed a traditional nn.LSTM layer (not the one above).Īnother approach, is to modify jeremy’s LSTM to accept sequence of images, so integrate the convolutional layer on the recurrent part, and you would obtain a Convolutional Recurrent layer, a.k.a ConvLSTM. So an image, would be a vector of shape (1,128). You have various different approaches to this problem, you could encode images independently before feeding the recurrent layer, to a latent feature space, let’s say of size 128. So a sequence of images would have an extra dim to account for the time dimesion. I see that self.i_h = nn.Embedding(vocab_sz, n_hidden) has to be changed and I guess that a ResBlock has to be put here instead but I don’t seem to find any answer on how to do it.Ī batch of 8 images on greyscale (1ch) of size (128x128) would have shape (8, 1, 128,128). Self.rnn = nn.LSTM(n_hidden, n_hidden, n_layers, batch_first=True) Self.i_h = nn.Embedding(vocab_sz, n_hidden) Here’s the code to build an LSTM from scratch from lesson 8 - Any suggestions on how to modify it so that it can process images? class LMModel6(Module):ĭef _init_(self, vocab_sz, n_hidden, n_layers): I’m basically trying to combine CNN with LSTM. My input data is grayscale images with the batch shape of: torch.Size() and I want to classify them (there are only two classes). Hi, I am trying to create a similar model as LSTM RNN from lesson 8 (course v4) but instead of using text input data, I want to feed in a sequence of images.
0 Comments
Leave a Reply. |
AuthorWrite something about yourself. No need to be fancy, just an overview. ArchivesCategories |