I suppose my question is as follows: is there a mathematical justification for using SVD on categorical (read: one-hot encoded) data? If there isn't, what would the next best approach be within the context of massive datasets that require Spark for any sort of meaningful progress to be made?ĭoron Grossman-Naples Asks: Why do we care about the eigenvalues of the Frobenius map? #Exiftool recursive directory mode full#I know that SVD works well with sparse data (and one-hot encoding categorical variables certainly makes for some sparse data), but I can't seem to find any sort of mathematical support for using SVD when the sparse data is, effectively, full of binaries that don't display correlation. Given that PCA is no good, we're left with SVD. The two dimensionality reduction methods that are available in Spark are SVD and PCA. My question concerns the analysis of the remaining categorical variables. My general understanding is that PCA is best used in the analysis of continuous variables, so I don't mind separating the continuous from the categorical and applying PCA to that subset. The columns are a mixture of both categorical and continuous variables. The issue that I'm running into is that I'm working with an extremely large dataset - some 26 million rows by around 4000 columns. MCA in the reduction of categorical variables, but I've yet to see a solution as it involves the Spark/PySpark framework and APIs. I've found questions regarding the use of PCA vs. William Asks: Dimensionality Reduction of Categorical Variables in Sparkįirst off, I hope this question hasn't been asked already. If anyone can provide any insight it would be a massive help to not pulling the rest of my hair out. Īnd here is the EXIF date on the test file. heathenadmin% elseĮlse? echo "earliest date is $earliest_date and latest date is $latest_date"Įlse? echo "setting -CreateDate to $earliest_date and TimeCreated to unknown"Įlse? exiftool -CreateDate=$earliest_date. heathenadmin% if ( "$earliest_date" = "$latest_date" ) then heathenadmin% set latest_date="`exiftool -CreateDate -fileorder -CreateDate -q -s3. heathenadmin% set earliest_date="`exiftool -CreateDate -fileorder CreateDate -q -s3. heathenadmin% # set all dates to the earliest date heathenadmin% exiftool '-caption-abstract LANDSCAPE/untitled folder copy/untitled folder copy] heathenadmin% heathenadmin% # add ", " to the caption: heathenadmin% exiftool -L -overwrite_original -api "Filter=s/ä/ae/g s/ö/oe/g s/ü/ue/g s/Ä/Ae/g s/Ö/Oe/g s/Ü/Ue/g s/ß/ss/g" -TagsFromFile -all:all. Įxiftool '-caption-abstract LANDSCAPE/untitled folder copy/untitled folder copy] heathenadmin% heathenadmin% #!/bin/tcsh -fĮxiftool -L -overwrite_original -api "Filter=s/ä/ae/g s/ö/oe/g s/ü/ue/g s/Ä/Ae/g s/Ö/Oe/g s/Ü/Ue/g s/ß/ss/g" -TagsFromFile -all:all. When trying to run the script below or botched versions I've tried to create (have zero scripting knowledge and have been trying to learn) no matter what I cannot get it to pull the oldest and apply it to all other tags. Though I do know and use Exiftool daily to modify video assets. I've found several posts pertaining to image files for something like this, but not video files. I can get to my wanted net result through either of those results. #Exiftool recursive directory mode update#I essentially want to run a recursive command that will run through the assets pulling the oldest date & time from all the available tags and then either set the FileModify or CreateDate OR at very least just update the filename. Nearly all files have correct EncodeDate, TrackCreateDate and/or MediaCreateDate, but I'm biff'd on a mass rename using the wrong EXIF commands and ended up with files names, FileModifyDates and CreateDates. I've been wrangling nearly 40TBs of videos that were recovered from 2 crashed RAIDs. #Exiftool recursive directory mode mp4#Jason Paul Michaels Asks: EXIFTOOL set MP4 to earliest date recursive - Can it be done?
0 Comments
Leave a Reply. |
AuthorWrite something about yourself. No need to be fancy, just an overview. ArchivesCategories |