Hi , in this page you will get
1. Machine Learning YouTube Link
2. Machine Learning Class Content
3. Every Lab Summary
4. Assignments
Understand What is Machine Learning in 2 Minutes
Machine Learning Class Content
https://drive.google.com/open?id=189cqwADhy76ruVr6luRx2GaX0KVxLAqv
Lab Exercises
IPython (Interactive Python) is a command shell for interactive computing in multiple programming languages, originally developed for the Python programming language, that offers introspection, rich media, shell syntax, tab completion, and history.
Exercise :
1. For a list of tuples, store the pairs which are coprime (GCD == 1.)
Find the timings w.r.t.
* List comprehension
* Nested for loops
* Using filter
2. Find out which is faster for squaring numbers in a list
* By list comprehension (a ^ 2 in the list)
* map applied to a list
3. How can you multiply two lists elementwise with the help of a partial function.
Note: Every ML Lab Exercises will available here .
24 Jan 2019
Theory and Lab Exercise s
Go to below link (Theory and Exercises are taken from below URL)
https://github.com/
31 Jan 2019
Today he told us about Numpy operations
More Reference
https://github.com/akshaybadola/itlab2018/blob/master/slides/4-numpy.pdf
7 Feb 2019
Today class was about Ploting using Matplotlib Python Library .
All exercise from below link , chapter 2 .
https://github.com/akshaybadola/itlab2018/blob/master/slides/4-numpy.pdf
Assignment Time
Hi Guys,
In first assignment you need to make your own data sets . And need to submit a report how your data is converges with below C program code .
Deadline of First Assignment is 28 Feb , 2019 .
You need to submit your report to vcpnair73@gmail.com
Any Help need ? Ping me (I feel happy to help you) .
To Join our group :-> https://groups.google.com/forum/#!forum/cgfm2018
ML Lab First Assignment
Hi ,
Below are the problem statements for ML Lab
First Group Assignment . Please check which problem you need to solve in
this link -> https://docs.google.com/ spreadsheets/d/1rYs8WVs9a4b_ 4lw__ uEulqlFjB72mfHZoFR9br9LWAg/ edit?usp=sharing
Note: Every Group has unique Group Number . Please Follow it .
1. Wikimedia math fetcher enhancement:
I had written a media wiki script which fetches the first pagein a search result query from wikipedia. Stores its content in
specified directories and then gathers the latex elements and
compiles them to tex.
Given that code write functions to
1. Segregate surrounding text from the equations
2. A class to build a database such that given a query, first it
searches in the local store (pages and their content are
never deleted) and if it can't find something, only then goes
online and retrieves.
3. Currently it merely fetches the topmost page. I would like
the function to be content addressable, as in, it searches
for keywords in the text data, which can be linked to the
math data and based on that returns suggestions. (whoosh can
help in that)
4. Currently only the latex is separated out and the rest is
merely kept aside. I would like functionality to separate the
relevant links in the pages also to be filtered out. They
could be returned as a separate object (dictionary instance
for example).
Wikipedia has a specific data storage format which can be
accessed for free through its api. It is a vast repository of
curated data.
1. The api can be accessed with a python client (I've used mwclinet)
2. Wikipedia allows a search function which can be used to retrieve data
3. I have a script that filters out math data from the page,
what I would like is something that filters out ALL objects
and stores them for easy retrieval.
4. Objects can be:
- Links
- Images
- Citations
- Categories
- "See also"
- Text (obviously)
5. Fetching all that data in dictionary format would be
enough. No need to search and index it. However, data has to
be in an efficient format for retrieval. Some nested
dictionary types or custom types where given a page name, I
can get:
1. All the text, citations, references, links etc.
2. Given an index of an item (text, citation etc.) in the
page, I can get all the indices of other items which are
related to it (citations, images).
3. So it's an addressable store.
- This task involves twitter data extraction. There's a twitter
API which lets you access data from twitter albeit with some
restrictions.
- All tweets are in public domain and free to use. For this
task, we will focus on a particular topic and gather all the
tweets related to it.
- For this given a topic (can be any topic) and associated tags
(#tags), the program should be able to retrieve the entire
flow of data in a given time period.
- Flow here means:
1. Who sent the tweet, when was it sent
- Attributes of the tweet like, how many times it was retweeted, or liked etc.
You'll need to work around twitter's API's restrictions which
has certain constraints.
Flickr has a large repository of high quality images uploaded by
individuals. Flickr contains images both public and private and
flickr provides an API to retrieve the public data.
1. I'm not sure if the images are tagged or not, but that can be checked.
2. What we want is content based retrieval, i.e., images where
there is a particular activity taking place (skiing, sports)
or a particular object (Car, dog) is there.
3. Since tags may not be there, we'll use image classification
models to match the images.
4. Presence of a tag will not necessarily mean that the image
contains what we might be interested in. A combination of a
classifier and tags may be used.
5. Classifier and other things will be provided.
Visual Genome is a huge annotated image dataset with semantic
information generated by humans.
- The annotations are objects, their bounding boxes, synsets of
objects, relationships, synsets of relationships. (Search
online what a synset is)
- Going through all this data is a challenge and visualizing that even more.
For this a tool has to be developed, which mines the data for
particular objects or relationships and then visualizes them
with the help of advanced tools to detect patterns.
There are a lot of attributes there and the task can be split into two.
6. Data visualizations for Visual Genome with (networkx or other tools?) - 2
1. HookPoint can be a type then. Should return properties
(which ideally should be automatically deduced
from where it is in the source code)
2. The name of the HookPoint should also be automatically inferred.
3. The HookPoint should be able to return where in the workflow
(based on a predefined workflow of the program) it is.
4. The HookPoint should know which variables are available to it.
Calling instance's reference is a special case.
5. Based on name of a HookPoint and/or properties and occurence
in the workflow it should be possible to add any custom hook
(which is a callable) to the HookPoint.
6. It should be possible to query the HookPoint, which hooks does
it contain and to change the order if required.
- Science Parse is a library which parses scientific (pdf)
documents and extracts data from them.
- PDFs are not designed to be parsed and are not really structured
as such. Science Parse uses a Machine Learning model to operate on them.
- Science Parse is accessible via a server and it returns the
data in JSON format.
- Given access to such a server you have to build an application that:
1. Clean the data:
- There might be spelling errors and such in the data.
- References have to be crosschecked from a reliable source like DBLP
- Some data may need to be discarded.
2. Store in an accessible format and index it
- Separate the text and references.
- Index using whoosh
3. Generate and visualize graphs for:
- References
- Authors
FEVER is a large dataset which has assertions and supporting statements
see http://fever.ai/
TBD
SImilar task to above but either a different processing step to
same dataset or a different dataset
TBD
Fabric is a system administration tool which lets you execute a
single task on multiple machines connected to the network.
Some time back I had written a fabric script which would
1. Try to wake up all the machines in AI Lab
2. Get all the running machines in the AI Lab
3. Return a list of all the machines which are not running.
4. Perform a given task which is written in a function to all
the running machines in parallel
5. Store the output in a dictionary.
However, the fabric library has upgraded and the script doesn't
work now. You have to upgrade it to get it work.
Note :Further details are available here https://groups.google.com/d/forum/cgfm2018
Previous Year Answers Important Links
https://web.cs.wpi.edu/~cs539/s99/practice_exam1.html
https://web.cs.wpi.edu/~cs539/s99/solutions_practice_exam1.html
http://www.timvanerven.nl/wp-content/uploads/teaching/ml0708/exercises/mlex2-answers.pdf
No comments:
Post a Comment