Humans have an astonishing ability to remember with high fidelity previously viewed scenes with robust memory for visual detail (Konkle et al., 2010). To better understand the mechanism that affords us this massive memory we investigated the automaticity of encoding into visual long-term memory. We studied this in two ways across three experiments. First, measuring the effect of limiting the time of encoding by varying the allotted time to encode each image while keeping overall time of study constant. Second, measuring the effect of an attentionally-demanding concurrent task on subsequent retention by systematically varying the levels of demand imposed by the concurrent executive task. If encoding is automatic neither shorter exposure nor concurrent demand should influence subsequent recognition. If executive attention is required than memory performance should decline as load is increased and encoding time decreases. We tested scene memory using a standard massive memory paradigm with a heterogeneous (452 real complex scenes) and a homogenous (304 doors) image set examining if time of encoding and concurrent tasks affects scene memorability. Even when encoding a very rich heterogeneous set of images the encoding time mattered, with a significant reduction in performance from 3 seconds (d'= 1.11) to 1 second (d'=.60) study time. Interestingly, further reduction in encoding time to 0.5 seconds shows no significant decrement suggesting that encoding might follow a two-step process. Further, high fidelity encoding is reduced when during encoding there is a competing working memory task both for heterogeneous (d'=.91 for no load to d'=.41 for high load) as well as homogenous (d'=.65 for no load to d'=.21 for high load) image sets with amplified reduction when there is little idiosyncratic detail in the image. Results suggest that visual recognition memory encoding is the outcome of a two-stage rather than an automatic process. Meeting abstract presented at VSS 2015.