IchiruTake/Bit2Edge: AIPBDET
Creators
Description
This release has fixed some several issues as well as gaining some better performance received.
- [Hidden Changed]: Learning.py: changing some attribute name, unchanged result
[API Changed]: Learning.py: "getModelRatio" now has been converted to "prepare_model". Add hyper-parameter "DataCleaning" and "strict_cleaning" which performs exactly in "DataGenerator.createData()". Only works if 'retrainModel' is False. Note that using 'retrainModel' is typically dangerous as it did not pass any validation. This supplementation is applied after we have finished all of the value, and only be used if you want to apply Data Cleaning in the "training set" only (NOT on validation set or testing set).
[Bug Fixed]: Creator.py: "DataGenerator.createData()": "DataCleaning" don't work with strict_cleaning=True. Now it can be applied
[Behaviour Changed]: Predict.py: "visualize" method now has removed "axis-number" as hyper-parameter.
- [Behaviour Changed]: Predict.py: "visualize" method now accept some extra "chosen_type". With "chosen_type.lower() = "last" --> Remain the same. With "chosen_type.lower() = "fingerprint" ("fingerprints", "multi-fingerprints") --> Remain the same but can select the coloring representation. --> Compute all fingerprint and merge as three alone feature With "chosen_type.lower() = "single" --> Compute each 'selected' environment fingerprint and return as one value only per each With "chosen_type.lower() = "meaning" ("nature", "attribute") --> Compute large (Full Structural) and smaller (Radicals) environments into two values for representation
- [Method Changed]: Predict.py: "visualize": hyper-parameter "decomposeMethod" now use "umap" as default with n_neighbors is equal to the number of unique bond recorded in the input dataset multiplied by 2.
- [Minor Performance Boost]: Now it is faster to validate the data type by 15% - 25% by changing syntax (Syntax: type(a) is B or type(a) is C --> isinstance(a, (B, C))) but it is not significant. Test case show that for 50M loop, OLD syntax: 6.5515s <--> NEW syntax: 5.4085s with 10% memory cache lower. Apply for all Python file.
- [API Changed]: Hyper-parameter is varied between "FileName" and "filePath" is now merged as "FileName".
- [Method Changed]: Creator.py: method"DataGenerator.getFeatureColumn()" now has another hyper-parameter "number_of_input" which is directly attached with ".createData" and return the column file.
- [Performance Boost]: Predict.py: class "PredictModel" introduce new method ".generate()" is directly attached into ".createData()" to help build up all configuration used in some other class methods. Reduced 0.1 - 0.3 seconds when need to find special position to locate instead of searching at everytime called. Extra 4 KB needed only ---> Better Design Pattern For Debugging.
- [Documentation]: now PredictModel will print out some extra line to separate method section at IDE Console // Terminal to make easier verification
- [Attribute Changed]: class "PredictModel", self.n_bondType_Saved is now corrected as self.bondType_Saved
- [Performance Boost]: function"PredictModel.generate()" is now faster than older version by 15-20% (from 4.4-4.6 s to 3.5-3.7s) by using built-in Python method rather than Numpy.char
- [Hidden]: function "PredictModel.generate()" will now be called once only. All new variable is saved for importing new dataset. This implementation is extremely useful when trying to predict mutliple dataset with same or minor configuration
- [NEW]: function "PredictModel.estimate_density()" is introduced to estimate the fingerprint status relationship versus BDE estimation. This model will return a DataFrame first and a Series later (pandas library)
- "sparsity_mode": bool (default to False). If True, the density array would then be changed into "sparsity".
- "percentage_mode": bool (default to False). If True, all of the data will be multiplied with 100.
- "dataType": numpy.dtype (default to numpy.float32). Don't select integer as dataType as all the result will be turned all into zero. Suggestion: float, np.float16, np.float32, ... (np.float64 or TF.dtype or PyTorch.dtype is not recommended).
- "output": str (default to None). Choose the path and name to create a .csv file. If the filename contain ".csv" at the end, it will create the csv of DataFrame only. Else, it will create both DataFrame and Series into two .csv file with extra name behind.
- [Bug Fixed] function "PredictModel.process_available_data()" now accept self.number_of_input=4
- [Behaviour Changed]: Now "PredictModel.estimate_density()" will return two DataFrame(s) instead. And these two will be saved into the entity. In this mode, there are some extra key user's input that behave as same as other keyword but applied for density matrix only.
- [New Feature]: Now "PredictModel.visuallize()" now accept new value for "chosen_type": "density" or "sparsity". However, this method only create new density DataFrame only if it is not stored when calling ".estimate_density()". If you want to rebuilt new ones. Use ".reset_density()"
Note: Next batch will introduce documentation to give better understanding for scaling and reproducing.
Files
IchiruTake/Bit2Edge-1.2.2-beta.5.zip
Files
(298.9 kB)
Name | Size | Download all |
---|---|---|
md5:07af3e6924d2d83fa1def36831b565e5
|
298.9 kB | Preview Download |
Additional details
Related works
- Is supplement to
- https://github.com/IchiruTake/Bit2Edge/tree/1.2.2-beta.5 (URL)