Let's get straight to the controversial bit: email address validation. A penny-drop moment during this week's video was that the native browser address validator rejects many otherwise RFC compliant forms. As an example, I asked ChatGTP about the validity of the pipe symbol during the live stream and according to the AI, it's permissible "when properly quoted":
"john|doe"@example.com
Give that a go and see how far you get in an input of type "email". Mind you, that example allows a pipe when not quoted. And the more you read, the more contradictory things seem; try this Stack Overflow question about allowable characters in an address and you'll get a heap of "yeah, that one is allowed but only if quoted"... which means it won't work in an email input box! (Unless you use the "pattern" attribute and a regex that permits it - argh!)
tl;dr - especially for the purpose in question - extracting email addresses from a data dump - I think I'm just going to boilthis down to a handful of permissible characters that are broadly accepted by websites and just stick with those. If you're a unique enough snowflake to be putting a quoted pipe in your alias then you're clearly not signing up to very many websites.
Click to Open Code Editor