In general it would be good to use GCM data which will be available through the Climate Data Store (see: http://climateservice-global.eu/about-the-data/) and until then the models which are available on the ESGF portal. With this choice you have already a selection.
Otherwise, you should use those realizations which are consistent in the historical and scenario simulations. Sometimes they differ. In general, there is no clear rule. In order to keep the ensemble spread, it would be good to use as many ensemble members as possible. But as this is often not feasible, it is definitively good to use at least more than one realization and/or more than one model. Often the selection depends also on the considered research question. If you are interested in one specific variable, then those models and realizations of the historical simulations are selected which do capture the main features of that variable. In other cases the main circulation patterns could be of interested as e.g. NAO or ENSO. But still, the ensemble spread should be kept.
Yes, if you take the GCMs listed here, http://climateservice-global.eu/about-the-data/, then you get for all of them the r1i1p1 realization on the ESGF portal, which you can use as well. This is also what we use now for the calculation of the climate impact indicators. Except for EC-EARTH, for this model we take the r12i1p1 realization. But in general, there is no clear rule which realizations to use, as mentioned above.