tag:blogger.com,1999:blog-684443317148892945.post3664426104027870939..comments2024-02-29T23:54:39.092-08:00Comments on RDKit: RDKit Knime Workflows V: Building a predictive modelgreg landrumhttp://www.blogger.com/profile/10263150365422242369noreply@blogger.comBlogger3125tag:blogger.com,1999:blog-684443317148892945.post-46344545657925971602015-07-17T02:05:13.402-07:002015-07-17T02:05:13.402-07:00Hi,
I tried to download this workflow from the pr...Hi,<br /><br />I tried to download this workflow from the provided link, but it seems to no longer work. Would you be able to provide it again?<br /><br />Many thanks,<br /><br />AngusAngushttps://www.blogger.com/profile/13397648201750161531noreply@blogger.comtag:blogger.com,1999:blog-684443317148892945.post-47422431221304415302014-02-23T20:15:35.140-08:002014-02-23T20:15:35.140-08:00Looks like you understood the settings perfectly. ...Looks like you understood the settings perfectly. :-)<br />And, yes, it's picking from a random selection of the square root of the number of features at each node.<br /><br />The only parameters here that I normally do much tweaking of other than the fingerprint are the number of trees and the max depth of the trees. I did spend some time exploring parameter space while putting this together, which is how I landed on a depth of 15.<br /><br />Since I ended up using out-of-bag classification for validation, I really should have increased the number of trees: each final prediction is being made by, on average, only 70 trees. That feels a bit light to me.greg landrumhttps://www.blogger.com/profile/10263150365422242369noreply@blogger.comtag:blogger.com,1999:blog-684443317148892945.post-41752590853167021822014-02-23T15:38:24.379-08:002014-02-23T15:38:24.379-08:00I probably misunderstand the KNIME settings but it...I probably misunderstand the KNIME settings but it looks like you're doing a 30% holdout with 100 trees generated with 15 max levels? I couldn't quite tell what the settings for the max number of features considered for each split was but the default usually tends to be the square root of the number of supplied features for classification problems.<br /><br />It might be interesting to see what changing some of these variables might have on the classification accuracy. :)tantrevhttps://www.blogger.com/profile/02397131628997802859noreply@blogger.com