Part 3: Single cell genome assembly

Questions:

Q3.1: Did you notice how many read pairs from HiSeq or MiSeq data were merged by SeqPrep?
Is there a reason why a certain data set has higher merge rates than the other? How many reads were discarded in the process?

Q3.2: Did you notice any differences between HiSeq and MiSeq data assembled using both --sc and --careful flags, i.e., default?

Q3.3: Did you notice any differences in the quality of assembly when --sc and --careful were omitted in each data set?

Q3.4: What do you think is the best way to assess the 'quality' of an assembly? (e.g. total size, N50, number of predicted ORFs, completeness)

Q3.5: What do you think is the best way to assemble this particular SC dataset? Why?

Q3.6: What is the identity of the organism based on the analyses you have performed? What phylum does it belong to and is there any closely related organisms in the databases?

Q3.7: Try to find out in what type of environment you might find similar organisms in .

Note: If you are ahead of time and have completed the exercises (and have plenty of time) you can start doing Part 6, which is similar to this exercise but using MiSeq data.