diff --git a/README.md b/README.md index 4b438d9..c1a7b93 100644 --- a/README.md +++ b/README.md @@ -7,7 +7,7 @@ The primer is spread across a collection of [IPython Notebooks](http://ipython.o There are four versions of the primer. Three versions contain the entire primer in a single notebook: * Single IPython Notebook (cleared output cells): [Python_for_Data_Science_clean.ipynb](Python_for_Data_Science_clean.ipynb) -* Single IPython Notebook (filled output cells): [Python_for_Data_Science_clean.ipynb](Python_for_Data_Science_all.ipynb) +* Single IPython Notebook (filled output cells): [Python_for_Data_Science_all.ipynb](Python_for_Data_Science_all.ipynb) * Single web page (HTML): [Python_for_Data_Science_all.html](Python_for_Data_Science_all.html) The other version divides the primer into 5 separate notebooks: @@ -56,4 +56,4 @@ There are also 2 data files, based on the [mushroom dataset](https://archive.ics * Changed ["call by reference"](https://en.wikipedia.org/wiki/Evaluation_strategy#Call_by_reference) to ["call by sharing"](https://en.wikipedia.org/wiki/Evaluation_strategy#Call_by_sharing) * Added `isinstance()` (and reference to duck typing) to section on `type()` * Added variable for `delimiter` rather than hard-coding `'|'` character -* Cleaned up various cells \ No newline at end of file +* Cleaned up various cells diff --git a/simple_ml.py b/simple_ml.py index aedd741..5bb84a6 100644 --- a/simple_ml.py +++ b/simple_ml.py @@ -195,7 +195,7 @@ def entropy(instances, class_index=0, attribute_name=None, value_name=None): '={}'.format(value_name) if value_name else '')) for value in value_counts: value_probability = value_counts[value] / num_instances - child_entropy = value_probability * math.log(value_probability, num_values) + child_entropy = value_probability * math.log(value_probability, 2) attribute_entropy -= child_entropy if attribute_name: print(' - p({0}) x log(p({0}), {1}) = - {2:5.3f} x log({2:5.3f}) = {3:5.3f}'.format(