Friday, February 12, 2021

Which course is better to build a good foundation, Harvard's CS50 or MIT's 6.00.1x?

 As a learner with zero-programing knowledge/skills, which course is better in terms of building a good foundation on Computer Science, CS? 
A quick search on google about this question will give two very popular introductory CS courses offered by Ivy League schools, Harvard's Cs50x and IMT's  6.00.1x. The question now is, as a new learner which of these two popular courses should one take first, or is it necessary for one to take both of these courses.
In this post, we try to analyze the contents of the courses provided by students who have followed either one or both courses, in order to provide informed feedback to the reader.

QUORA

Thanks for the question. I'd taken CS 50 around five years ago. The skeleton still seems similar although it's moved beyond Harvard now and into Yale too.
So, CS 50 is broad. It covers a lot of ground and one gets an overall idea of what's possible and happening in CS today. The scope of the final project is huge, I designed an application that checked and optimised stock prices. You could do a lot more, like if you bought some hardware , like Google glass or leap, you could reprogramme them as per your utility.
I did do MIT 6.001 intro to CS with python too. That's in some sense, a no nonsense course. You get to learn the most popular language, python, at a very authentic computer scientists' level. You'll learn oop, a little complexity, searching sorting, and much more. Sure, the professors are less interesting, in the sense they're not as dramatic as David Malan and might not go to his extent to make things clear, but Eric and John are exceptional teachers and the concepts they teach are crystal clear. After taking this course you can try actual computer science courses, like algorithm and then automata.
Hence, if you want a good overview of CS, go for CS 50. If you want to learn beyond basic programming too, like maybe want to venture into data science or computer science, go for MIT's course.
If you've time, do both, first CS 50 and then MIT.
Good luck.

I have done both 6.00.1x and 6.00.2x They are goth great, hard, hard work, and I recommend them. I have not done CS50 - only the first 1/12th, yet I am seriously considering it (to have the most solid introduction to CS possible). They both provide an introduction to computer science (CS). In my view, there are two things to consider. Content and deadlines.

If you want to learn Python, or if you need deadlines to sustain performance, go with the MIT course(s). They won’t distract you with anything else except Python and some Python-based big data tools in the second MOOC. Do both of them if you can. Python used not to be covered in CS50, but since about 2018 it is. If you want some web development, namely JavaScript (JS), go with the Harvard course (you will certainly not find webdev in the MIT course). Harvard’s Malan teaches HTML, CSS, JS, C, Python and ends with Python-based web dev mini framework Flask. (They dropped PHP in the same year. Yet, given the orientation of CS50 you will hear about it in there.)

My final note will be about deadlines. People who miss study skills might find out the 6.00.1x whizzed away (there are two instances of 6.00.1x & 6.00.2x duo a year); each instance of the course(s) has deadlines (for example every Wendesday) and you need to do a good job organization-wise. My initial trouble was the course whizzed away, but I easily took the very next iteration and finished it very very well. As well as the time-wise immediate follow-up - 6.00.2x. Actually, the deadlines are another plus of the MIT courses over Harvard’s CS50. As far as I am concerned.

CS50 has one deadline, Dec 31 every year, and even that might be flexible as one can transfer the parts finished for example in 2018 ahead to the 2019 instance. I found I would appreciate more frequent deadlines to gear in this course optimally; lack of deadlines perhaps works OK for real good planners, yet have been a hindrance for me.

If you go for any of the two, remember. (1) Get involved with your classmates inside the course, plus definitely over Facebook or whatever social platform, short of cheating. That will prevent your propensity to give up (if you are a rookie), or will help you to forge friendships or collegiate relationships. (2) If you will have some time to read (I had), I appreciated the textbook (prof. Guttag’s in case of the MITx courses).

I completed MIT 6.00.1x Introduction to computer science programming using python last year. This year I took cs50 and cs101.1x on edx. I did Cs50 till pset5 and I am on it to continue further psets. So I will try to answer your question.

Coming to the first course that I took i.e MIT 6.00.1x .

  • I took this course when I was in second year , so I was little bit familiar with the coding. I absolutely loved this course. The professors are top-notch. The thing I liked about it was that after watching a video lecture of (10–15 minutes mostly) you will be given an exercise (finger exercises as they are called in the course) to solve right after watching the lecture. So you will dive into the code right after it ,this is very helpful as you will implement what you have learnt just now. Secondly the lectures are detailed and as per the course no programming book is required for this course (but it strictly follows a book written by one of the professors of the course) some may find it great as some people prefer to learn by just by watching the lectures. And the finger exercises will help to improve your program solving capability. After the lectures you will be given psets to solve that will further help you to push yourself. Another thing is that since this course is not self paced unlike CS50 ,time commitment is on higher side . It requires more time. Being a non-self-paced course and psets bounded by the deadline one may find it difficult to handle it along with other commitments. I personally struggled a lot completing this course along with my college studies. Some people leave the course before finishing it because

  1. This course is available twice a year and are not available during the summer/winte vacations in which the students have ample of time . Even the current offering of this course is starting in August. Like I said , it sometimes become difficult to dedicate time to it. Although the archived version of course is available but it defeats the purpose of non-self-paced course.
  2. Sometimes when you complete the week’s material before the end of the week you will have to wait for other week’s material to be released. Student have to unwilling wait for the release as you can’t access the next week’s material right after it.

All in all this is a very good course if you can dedicate enough time to it and complete all the psets/quizzes before deadline. The learning curve is steep and it keeps you on your toes. You will have no problem with the teaching style of professors and other things.

  • Second course CS50x .

Since I have completed it till week6 (60% of the course) I will write about it till what I have felt till week 6. It is the biggest online course on edx. This probably is the closest one can ever get to the hallowed halls of an Ivy League Institution virtually. It feels like you have stepped into the live classroom. This course is very well structured. Professor David J Malan is one of the best professors I have ever seen. His way of teaching will keep you engaged throughout the course . Most importantly this class is fun, I cannot underestimate this part, unlike other online courses’s lecture where one may feel bored after watching 20–30 min video. Cs50’s lectures will have you completely engrossed in it till the end and the ending music after the lectures further pumps you up. The teaching style is phenomenal , professor uses live example to demonstrate a concept. The shorts(lectures ) are very good. The best thing about it is that it is self paced and doable. Can be managed with other commitments because the only true deadline is the last date of the course. All problem sets, quizzes and the project can be submitted any time at all. Plus they have a simple schedule which is based on the date you have logged in suggests various dates for all the items that you need to submit. It teaches many technologies with C like HTML 5 ,CSS3, php, JavaScript ,little bit of mySQL and many other things. It tries to teach many things in one course and is successful in doing so. The problem sets are fun to do and it has a great student community always willing to help. You will never feel you are stuck in it without any help.


Definitely CS50. It covers a lot of materials in exciting ways. You will be not only taught how to program in some programming languages such as C and PHP and others... but you also learn how to have fun programming. Besides just programming, CS50 also teaches you to set up virtual machines, how to get familiar with command-line commands, web programming, computer architecture, etc... You won't be overwhelmed, there are short videos and recitations that are made by the staff. The pace is appropriate for students at all levels. The problem sets are engaging and help you learn a lot. The lecture videos are interesting and exciting.

I'm taking both classes. And I have to say MITx 6 is ok. It is dry and boring. They focus too much on the mechanical stuff of Python and programming. I think the title should be "Intro to Python programming". The lecture videos instead of showing how it works just show you slides and some parts of the code demo. As the course dives into more CS topics, I would say it would get complete beginners confused. I mean who teaches semantics and ALU to beginners at the first lecture? :).

In my opinion, CS50 is more well structured and well taught. I have learned more from this course than my original Intro to CS course at my university.

So I highly recommend CS50. Period. Anyway, I feel bad to downgrade MIT 6. But I do appreciate the hard work they put into the course. It can be a good companion course for those taking CS50.


REDDIT

Both are great! 6.00.1x uses Python as it's language of instruction; CS50 uses C (at least, this was the case when I took them a few years back). With minimal programming experience, they will both be very challenging. I've actually started each and then stopped, and then started again several times. Don't rush through the course! Take your time; give yourself ample time to process all the lecture material. Both courses do a great job teaching you how to 'think' programmatically, which can be a challenge for a lot of people.

As a final thought, I'd recommend auditing the course (taking it for free) as opposed to paying for the certification. I'm not sure the cert would do much for you in the real world. Some will say paying for the course is a great way to stay motivated. In my case, I didn't finish the first time I started, so I was glad I didn't shell out the money. For me personally, money alone is not motivation enough to keep plugging through a challenging course; rather, a willingness to learn something difficult but very cool is what motivates me most.

Whoops, sorry, one more thought: the mistake I made early on was thinking I needed to learn every programming language under the sun. This is NOT the case; in fact, I found this hindered my learning significantly: as soon as I started getting comfortable learning C, I'd switch to Python; once I learned the basics there, I switched to Ruby; then I wanted to build iOS Apps, so I tried Swift; then I wanted to be a strong Web Developer, so I tried JavaScript, etc.

The vast majority of online programming/CS courses follow the same exact structure. You learn the foundations of the language (syntax, variables, loops, etc), and then you apply it to a project (e.g. Fizz Buzz, write a function that does X,Y,Z, etc). By jumping from language to language, you do yourself a disservice by only giving yourself a very shallow understanding of the language. Who cares that I can declare variables in Python, Ruby, or C? who cares that I know how to write loops in 5 different languages? If I'm not immersing myself in a language, if I'm not learning the nitty-gritty mechanics of how things work (and, crucially, WHY they work the way they do), then I'm not really learning the language at all.

So my advice to you would be this: whether it's better to take CS50 over 6.00.1x is not as important as you might think. Both will teach you about the fundamentals of computer programming (albeit with different languages and lessons). With a solid foundation, you can later learn new languages or technologies. It's true that some languages are better suited for different tasks; however, if you're just starting out, these differences are not significant. So stick with a foundational course and do your damn best to finish it completely (it's okay if you don't on the first go!). Stick with ONE language while you get your feet wet. Build a solid understanding of the foundations, and you will quickly find yourself making strides in your future learning.

I found CS50 to be a lot more challenging and rewarding than 6.00.1x, for the following reasons:

  • CS50 is longer than 6.00.1x and teaches more material (12 weeks vs 7) 
  • The instructor for CS50, David Malan, is much more energetic and enthusiastic than Eric Grimson, the prof for 6.00.1x 
  • The problem sets for CS50 force you to learn things on your own in order to complete them, whereas the problems for 6.00.1x are much more contained. The ability to learn things on your own by googling and reading documentation is really important for programming.
  • CS50 teaches C, a low level language in the beginning, and then switches gears to high level languages like Python/PHP (when I took it) and JavaScript, whereas 6.00.1x only teaches Python. After using C, you'd be amazed by a lot of the build-in functionality that comes with Python, but at the same time appreciate why a low level language like C continues to exist and dominate in certain fields
  • Plus, a lot of popular languages have C-inspired syntax (Java, C++, C#, etc) so you might have an easier time learning those as opposed to coming from Python (though having Python as a 1st language would not limit your learning in any way, it'll just take you some time to get used to using parentheses and curly braces as opposed to indentation)

You can certainly take both (I have), but honestly, taking 6.00.1x after CS50 is redundant in my opinion.

As someone who has taken both, CS50 all the way. I took 6.00.1x to get back into programming after taking a year and a half off after high school (I taught myself Supercollider and Python using documentation and trial and error). I already knew Python but wasn't very good at thinking programmatically. In my opinion, the tests and assignments in 6.00.1x were too easy and too memorization based. I learned significantly more when I took CS50, I feel like learning with C covers a much larger foundation and you learn more about memory and what higher level code actually does under the hood. You can leverage what you know in C to start learning C++ too, it really helped me along that path. Learning things yourself is a huge part of programming and CS50 embraces that fully, where as 6.00.1x provides you with all of the necessary information and concepts to complete the tests. Personally, if I could do it over, I'd take CS50 first; then 6.00.1x.

VERDICT

Well, Both courses are designed for novice or beginner so there is no difference in the manner of programming experience and knowledge. The only difference is about languages and course structure. In 6.00x (MIT) you will learn about programming concepts with the very popular Python language whereas in CS50x (Harvard) you will get a chance to learn the same concepts with different languages like C, PHP, and JavaScript plus SQL, CSS, and HTML. Except this, I must say that CS50x (Harvard) is a little bit complex to learn in comparison to 6.00x (MIT).
  • Which of them takes more time?
CS50x (Harvard); but you should not worry because Harvard lets you free from due dates for problem-solving. You can submit it whenever you get time before the end date.
  • Which is more practical in real life?
It's very difficult to answer. But I think CS50x(Harvard) is more practical because it composes less theoretical and gives focus on more practical with different programming languages and patterns.
  • Which has the best teacher?
Both (No doubt)... For 6.00x I loved learning with Eric Grimson... He is just an awesome teacher and he has a great quality of teaching. You will learn almost everything when you will go through his lectures.
Prof Malan is one of the best lecturers I have had the pleasure to learn from, knowing how to keep an audience interested in just about anything. Malan and his army of TAs have also done a fantastic job of creating a community around CS50, and have provided a plethora of resources for their students. Their assignments and problem sets were also perfect. They were challenging and interesting without being inaccessible; teaching both programming skills and being relevant to important CS concepts. We also got a special lecture from Brian Kernighan at the end, which was a real treat.

I'm extremely appreciative of the skills and knowledge CS50 has given me, and would highly recommend it to anyone willing to take on the challenge. It's the reason why I have my current interest in and a basic understanding of CS.
  • What do you recommend?
I suggest you go with CS50x first then you can try 6.00x. Both are fabulous courses and you will learn a lot. I learned 6.00x (MIT first batch of Edx) and I must say that I enjoyed it very much. 
Happy Learning...
Happy coding......


Monday, May 1, 2017

Python versus R for Machine Learning and Data Science


Most people I meet in my daily activities who are interested in both Machine Learning and Data Science, have been asking about the best programming language for data science. Immediately, R and Python both come to mind… but which of these two giants to choose?


I have searched through the internet and have put together some resources and links to page and blogs which will help us all to decide on which language to settle on. I will start with a summary of the resources from leading machine learning and data science blogs and websites such as Udacity, datacamp, reddit, elitedatascience, opensource, kdnuggets, quora and datascience.stackexchange.
Verdict: Karlijn Willems in her infograhic comparison concluded that: "It's a tie! It's up to the you, the data scientist, to pick the language that best fits your needs".

Verdict: Cheng Han Lee in his analysis also concluded that, "In general, you can’t err whether you choose to learn Python first or R first for data analysis. Each language has its pros and cons for different scenarios and tasks. In addition, there are actually libraries to use Python with R, and vice versa—so learning one won’t preclude you from being able to learn and use the other. Perhaps the best solution is to use the these guidelines:
- Personal preference
Choose the language to begin with based on your personal preference, on which comes more naturally to you, which is easier to grasp from the get-go. To give you a sense of what to expect, mathematicians and statisticians tend to prefer R, whereas computer scientists and software engineers tend to favor Python. The best news is that once you learn to program well in one language, it’s pretty easy to pick up others.

-Project selection 
You can also make the Python vs. R call based on a project you know you’ll be working on in your data studies. If you’re working with data that’s been gathered and cleaned for you, and your main focus is the analysis of that data, go with R. If you have to work with dirty or jumbled data, or to scrape data from websites, files, or other data sources, you should start learning, or advancing your studies in, Python.

- Collaboration
Once you have the basics of data analysis under your belt, another criterion for evaluating which language to further your skills in is what language your teammates are using. If you’re all literally speaking the same language, it’ll make collaboration—as well as learning from each other—much easier.

- Job market
Jobs calling for skill in Python compared to R have increased similarly over the last few years.
Verdict: They concluded that, " both languages are actively being developed and have an impressive suite of tools already. It sounds cliché to say this, but there's really no one-size-fits-all answer.


Verdict: Tom Radcliffe made the following comments "The main issue with R is its consistency. Algorithms are provided by third parties, which makes them comparatively inconsistent. The resulting decrease in development speed comes from having to learn new ways to model data and make predictions with each new algorithm you use. Every package requires a new understanding. Inconsistency is true of the documentation as well, as R's documentation is almost always incomplete.


Verdict: KDnuggets using their figures and diagrams made the following observations:

"On the web, you can find many numbers comparing the adoption and popularity of R and Python. While these figures often give a good indication of how these two languages are evolving in the overall ecosystem of computer science, it’s hard to compare them side-by-side. The main reason for this is that you will find R only in a data science environment; As a general-purpose language, Python, on the other hand, is widely used in many fields, such as web development. This often biases the ranking results in favor of Python, while the salaries are affected somewhat negatively.
If you look at recent polls that focus on programming languages used for data analysis, R often is a clear winner. If you focus specifically on Python and R's data analysis community, a similar pattern appears.
Despite the above figures, there are signals that more people are switching from R to Python. Furthermore, there is a growing group of individuals using a combination of both languages when appropriate. This is exactly in line with what we recommend to our students as well."
  • My Analysis:
There are some things one will have to take into consideration before deciding on whether to choose R or Python. Think of the question as a problem statement. Your current inclinations and understanding of what you want in the future will help you reach an answer. 
It depends on the industry and even country and continent. My overall feeling is that Python is stronger in the US and R is stronger in Europe but I have no data to support this. Some companies use one of the other exclusively, whereas others use whatever gets each job done. 
 
Case for Python
Python is a general-purpose programming language that can pretty much do anything you need it to: data munging, data engineering, data wrangling, website scraping, web app building, and more. It’s simpler to master than R if you have previously learned an object-oriented programming language like Java or C++. In addition, because Python is an object-oriented programming language, it’s easier to write large-scale, maintainable, and robust code with it than with R. Using Python, the prototype code that you write on your own computer can be used as production code if needed.
Although Python doesn’t have as comprehensive a set of packages and libraries available to data professionals as R, the combination of Python with tools like Pandas, Numpy, Scipy, Scikit-Learn, and Seaborn will get you pretty darn close. The language is also slowly becoming more useful for tasks like machine learning, and basic to intermediate statistical work (formerly just R’s domain).


Why Python is Great for Data Science
  • Python was released in 1989. It has been around for a long time, and it has object-oriented programming baked in.
  • IPython / Jupyter's notebook IDE is excellent.
  • There's a large ecosystem. For example, Scikit-Learn's page receives 150,000 - 160,000 unique visitors per month.
  • There's Anaconda from Continuum Analytics, making package management very easy.
  • The Pandas library makes it simple to work with data frames and time-series data.
When and how to use Python?
You can use Python when your data analysis tasks need to be integrated with web apps or if statistics code needs to be incorporated into a production database. Being a fully-fledged programming language, it’s a great tool to implement algorithms for production use.
While the infancy of Python packages for data analysis was an issue in the past, this has improved significantly over the years. Make sure to install NumPy /SciPy (scientific computing) and pandas (data manipulation) to make Python usable for data analysis.  Also, have a look at matplotlib to make graphics and scikit-learn for machine learning.
Unlike R, Python has no clear “winning” IDE. We recommend you to have a look at SpyderIPython Notebook, and Rodeo to see which one best fits your needs.

Case for R:

Key quote: "There should be an interface to the very best numerical algorithms available." - John Chambers
R has a long and trusted history and a robust supporting community in the data industry. Together, those facts mean that you can rely on online support from others in the field if you need assistance or have questions about using the language. Plus, there are plenty of publicly released packages, more than 5,000 in fact, that you can download to use in tandem with R to extend its capabilities to new heights. That makes R great for conducting complex exploratory data analysis. R also integrates well with other computer languages like C++, Java, and C.
When you need to do heavy statistical analysis or graphing, R’s your go-to. Common mathematical operations like matrix multiplication work straight out of the box, and the language’s array-oriented syntax makes it easier to translate from math to code, especially for someone with no or minimal programming background.


 Why R is Great for Data Science
R was created in 1992, after Python, and was therefore able to learn from Python's lessons.
Rcpp makes it very easy to extend R with C++. RStudio is a mature and excellent IDE.
(Our note) CRAN is a candyland filled with machine learning algorithms and statistical tools.
(Our note) The Caret package makes it easy to use different algorithms from 1 single interface, much like what Scikit-Learn has done for Python.

When and how to use R?
R is mainly used when the data analysis task requires standalone computing or analysis on individual servers. It’s great for exploratory work, and it's handy for almost any type of data analysis because of the huge number of packages and readily usable tests that often provide you with the necessary tools to get up and running quickly. R can even be part of a big data solution.
When getting started with R, a good first step is to install the amazing RStudio IDE.  Once this is done, we recommend you to have a look at the following popular packages:



RECOMMENDATION (by https://datascience.stackexchange.com/a/339): 
  • Machine Learning has 2 phases. Model Building and Prediction phase. Typically, model building is performed as a batch process and predictions are done in realtime. The model building process is a compute-intensive process while the prediction happens in a jiffy. Therefore, the performance of an algorithm in Python or R doesn't really affect the turn-around time of the user. Python 1, R 1.
  • Production: The real difference between Python and R comes in being production-ready. Python, as such is a full-fledged programming language and many organizations use it in their production systems. R is a statistical programming software favored by many academia and due to the rise in data science and availability of libraries and being open-source, the industry has started using R. Many of these organizations have their production systems either in Java, C++, C#, Python etc. So, ideally, they would like to have the prediction system in the same language to reduce the latency and maintenance issues. Python 2, R 1.
  • Libraries: Both languages have enormous and reliable libraries. R has over 5000 libraries catering to many domains while Python has some incredible packages like Pandas, NumPy, SciPy, Scikit Learn, Matplotlib. Python 3, R 2.
  • Development: Both the language are interpreted languages. Many say that python is easy to learn, it's almost like reading English (to put it on a lighter note) but R requires more initial studying effort. Also, both of them have good IDEs (Spyder, etc for Python and RStudio for R). Python 4, R 2.
  • Speed: R software initially had problems with large computations (say, like nxn matrix multiplications). But, this issue is addressed with the introduction of R by Revolution Analytics. They have re-written computation-intensive operations in C which is blazingly fast. Python being a high-level language is relatively slow. Python 4, R 3.
  • Visualizations: In data science, we frequently tend to plot data to showcase patterns to users. Therefore, visualizations become important criteria in choosing software and R completely kills Python in this regard. Thanks to Hadley Wickham for an incredible ggplot2 package. R wins hands down. Python 4, R 4.
  • Dealing with Big Data: One of the constraints of R is it stores the data in system memory (RAM). So, RAM capacity becomes a constraint when you are handling Big Data. Python does well, but I would say, as both R and Python have HDFS connectors, leveraging Hadoop infrastructure would give substantial performance improvement. So, Python 5, R 5.
So, both the languages are equally good. Therefore, depending upon your domain and the place you work, you have to smartly choose the right language. The technology world usually prefers using a single language. Business users (marketing analytics, retail analytics) usually go with statistical programming languages like R, since they frequently do quick prototyping and build visualizations (which is faster done in R than Python). 










Which course is better to build a good foundation, Harvard's CS50 or MIT's 6.00.1x?

  As a learner with zero-programing knowledge/skills, which course is better in terms of building a good foundation on Computer Science, CS?...