Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Is there a coherent resource (not a scattered 'just google it' series of guides from all over the place) that encapsulates some of the concepts and workflows you're describing? What would be the best learning site/resource for arriving at understanding how to integrate and manipulate SD with precision like that? Thanks


I have found http://stable-diffusion-art.com to be an absolutely invaluable (and coherent) resource. It's highly ranked on Google for most "how to do X with stable diffusion" style searches, too.


> What would be the best learning site/resource for arriving at understanding how to integrate and manipulate SD with precision like that?

Honestly? Probably YouTube tutorials.


Jaysus.

I'm going to sound like an entitled whiny old guy shouting at clouds, but - what the hell; with all the knowledge being either locked and churned on Discord, or released in form of YouTube videos with no transcript and extremely low content density - how is anyone with a job supposed to keep up with this? Or is that a new form of gatekeeping - if you can't afford to burn a lot of time and attention as if in some kind of Proof of Work scheme, you're not allowed to play with the newest toys?

I mean, Discord I can sort of get - chit-chatting and shitposting is easier than writing articles or maintaining wikis, and it kind of grows organically from there. But YouTube? Surely making a video takes 10-100x the effort and cost, compared to writing an article with some screenshots, while also being 10x more costly to consume (in terms of wasted time and strained attention). How does that even work?


I've been playing with SD for a few months now and have only watched 20-30m of YT videos about it. There's only a few worth spending any time watching, and they're on specific workflows or techniques.

Best just to dive in if you're interested IMO. Otherwise you'll get lost in all the new jargon and ideas. Great place to start is the A1111 repo, lot of community resources available and batteries included.


How does anyone keep up with anything? It's a visual thing. A lot of people are learning drawing, modeling, animation etc in the exact same way - by watching YouTube (a bit) and experimenting (a lot).


Picking images from generated sets is a visual thing. Tweaking ControlNet might be too (IDK, I've never got a chance to use it - partly because of what I'm whining about here). However, writing prompts, fine-tuning models, assembling pipelines, renting GPUs, figuring out which software to use for what, where to get the weights, etc. - none of this is visual. It's pretty much programming and devops.

I can't see how covering this on YouTube, instead of (vs. in addition to) writing text + some screenshots and diagrams, makes any kind of sense.


This isn't for Stable Diffusion, but I wanted to provide a supplemental to my comment: https://kaiokendev.github.io/til

This is the level we're generally working at - first or second party to the authors of the research papers illustrating implementations of concepts, struggling with the Gradio interface, things going straight from commit to production.

It's way less frustrating to follow all of the authors in the citations of the projects you're interested in than wasting your attention sorting through blogspam, SEO, and YT trash just to find out they don't really understand anything, either.


Thank you. I was reluctant to chase after and track first-party research directly, or work directly derived from it, as my limited prior experience told me it's not the most efficient thing unless I want to go into that field of research myself. You're changing my mind about this; from now, I'll try sticking close to source.


There's a relatively thin layer between the papers and implementations, which is another way of saying this stuff is still for researchers and assumes a requisite level of background with them. It sounds like you'd benefit from seeking out the first party sources.

This is where video demonstrations come in handy. Since many concepts are novel, it's uncommon to find anyone who deeply understands them, but it's very easy to find people who have picked up on some tricks of the interfaces, which they're happy to click through. I think gradio/automatic1111 makes learning harder than it needs to be by hiding what it's doing behind its UI, while eg- comfyui has a higher initial learning curve but provides a more representational view of process and pipelines.


Take a moment and go scroll through the examples at civitai.com. Does most of them strike you as something by people with jobs? Most of them are pretty juvenile, with pretty women and various anime girls.


Are you under the impression that people with jobs don't like pretty women and anime girls?


Of course not, but it looks like a teenage boy's room.


An operative word here is people.... the set "people with jobs" contains a far higher fraction of folks who like attractive men than is represented here....


The difference being that youtube videos can make more money for the author. Anyway, it's all open source, so feel free to make a wiki


I would if I could keep up with the videos :).


I think it'd have been convenient for me as well if the AI tool that has access to YouTube videos would've been able to answer queries . But it takes 5 minutes to reply and I forgot it's name. It was on the front page recently


I mostly agree, but in this case it can be genuinely useful to actually see the process of someone using the tool effectively.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: