The unconditional LOOP instruction takes longer to run than a two-instruction sequence, which decrements the counter register and jumps if the count does not equal zero.
All branches are converted into 16-byte code fetches regardless of jump address or cacheability.