Wednesday, January 9, 2019

Python Classes 2019


Hi , in this page you will get

1. Machine Learning YouTube Link
2. Machine Learning Class Content
3. Every Lab  Summary
4. Assignments 

Understand What is Machine Learning in 2 Minutes 

Machine Learning Class Content

https://drive.google.com/open?id=189cqwADhy76ruVr6luRx2GaX0KVxLAqv 


Lab Exercises

IPython (Interactive Python) is a command shell for interactive computing in multiple programming languages, originally developed for the Python programming language, that offers introspection, rich media, shell syntax, tab completion, and history.
Date : 10/01/2019
Exercise :

1. For a list of tuples, store the pairs which are coprime (GCD == 1.)
  Find the timings w.r.t.
      * List comprehension
      * Nested for loops
      * Using filter
2. Find out which is faster for squaring numbers in a list
      * By list comprehension (a ^ 2 in the list)
      * map applied to a list
3. How can you multiply two lists elementwise with the help of a partial function.


Note: Every ML Lab Exercises will available here .

24 Jan 2019 
Theory and Lab Exercise s

Go to below link (Theory and Exercises are taken from below URL)
https://github.com/akshaybadola/itlab2018 


31 Jan 2019 

Today he told us about Numpy operations 

More Reference 
https://github.com/akshaybadola/itlab2018/blob/master/slides/4-numpy.pdf 

7 Feb 2019

Today class was about Ploting using Matplotlib Python Library .
All exercise from below link , chapter 2 . 

https://github.com/akshaybadola/itlab2018/blob/master/slides/4-numpy.pdf
 

Assignment Time

Hi Guys, 
In first assignment you need to make your own data sets . And need to submit a report  how your data is  converges with below C program code .

Deadline of First Assignment is 28 Feb , 2019 . 
You need to submit your report to vcpnair73@gmail.com

Any Help need ? Ping me  (I feel happy to help you) .

To Join our group :-> https://groups.google.com/forum/#!forum/cgfm2018

https://drive.google.com/open?id=1WcD0_FusbWGDQOKlEZGxAyphq8Df26V3


ML Lab First Assignment 

Hi ,
Below are the problem statements for ML Lab First Group Assignment . Please check which problem you need to solve in this link -> https://docs.google.com/spreadsheets/d/1rYs8WVs9a4b_4lw__uEulqlFjB72mfHZoFR9br9LWAg/edit?usp=sharing

Note: Every Group has unique Group Number . Please Follow it .

1. Wikimedia math fetcher enhancement:
     I had written a media wiki script which fetches the first page
     in a search result query from wikipedia. Stores its content in
     specified directories and then gathers the latex elements and
     compiles them to tex.
     Given that code write functions to
     1. Segregate surrounding text from the equations
     2. A class to build a database such that given a query, first it
        searches in the local store (pages and their content are
        never deleted) and if it can't find something, only then goes
        online and retrieves.
     3. Currently it merely fetches the topmost page. I would like
        the function to be content addressable, as in, it searches
        for keywords in the text data, which can be linked to the
        math data and based on that returns suggestions. (whoosh can
        help in that)
     4. Currently only the latex is separated out and the rest is
        merely kept aside. I would like functionality to separate the
        relevant links in the pages also to be filtered out. They
        could be returned as a separate object (dictionary instance
        for example).


  2. Wikimedia content of interest extraction.
     Wikipedia has a specific data storage format which can be
     accessed for free through its api. It is a vast repository of
     curated data.
     1. The api can be accessed with a python client (I've used mwclinet)
     2. Wikipedia allows a search function which can be used to retrieve data
     3. I have a script that filters out math data from the page,
        what I would like is something that filters out ALL objects
        and stores them for easy retrieval.
     4. Objects can be:
        - Links
        - Images
        - Citations
        - Categories
        - "See also"
        - Text (obviously)
     5. Fetching all that data in dictionary format would be
        enough. No need to search and index it. However, data has to
        be in an efficient format for retrieval. Some nested
        dictionary types or custom types where given a page name, I
        can get:
        1. All the text, citations, references, links etc.
        2. Given an index of an item (text, citation etc.)  in the
           page, I can get all the indices of other items which are
           related to it (citations, images).
        3. So it's an addressable store.



  3. Twitter data extraction to study rate of information change
     - This task involves twitter data extraction. There's a twitter
       API which lets you access data from twitter albeit with some
       restrictions.
     - All tweets are in public domain and free to use. For this
       task, we will focus on a particular topic and gather all the
       tweets related to it.
     - For this given a topic (can be any topic) and associated tags
       (#tags), the program should be able to retrieve the entire
       flow of data in a given time period.
     - Flow here means:
       1. Who sent the tweet, when was it sent
     - Attributes of the tweet like, how many times it was retweeted, or liked etc.
     You'll need to work around twitter's API's restrictions which
     has certain constraints.



  4. Flickr functions for retrieving and sanitizing images
     Flickr has a large repository of high quality images uploaded by
     individuals. Flickr contains images both public and private and
     flickr provides an API to retrieve the public data.
     1. I'm not sure if the images are tagged or not, but that can be checked.
     2. What we want is content based retrieval, i.e., images where
        there is a particular activity taking place (skiing, sports)
        or a particular object (Car, dog) is there.
     3. Since tags may not be there, we'll use image classification
        models to match the images.
     4. Presence of a tag will not necessarily mean that the image
        contains what we might be interested in. A combination of a
        classifier and tags may be used.
     5. Classifier and other things will be provided.



  5. Data visualizations for Visual Genome with (networkx or other tools?) - 1
     Visual Genome is a huge annotated image dataset with semantic
     information generated by humans.
     - The annotations are objects, their bounding boxes, synsets of
       objects, relationships, synsets of relationships. (Search
       online what a synset is)
     - Going through all this data is a challenge and visualizing that even more.
     For this a tool has to be developed, which mines the data for
     particular objects or relationships and then visualizes them
     with the help of advanced tools to detect patterns.
     There are a lot of attributes there and the task can be split into two.


  6. Data visualizations for Visual Genome with (networkx or other tools?) - 2


  7. HookPoint: Arbitrary code insertion in a sequential stateful program
     1. HookPoint can be a type then. Should return properties
        (which ideally should be automatically deduced
         from where it is in the source code)
     2. The name of the HookPoint should also be automatically inferred.
     3. The HookPoint should be able to return where in the workflow
        (based on a predefined workflow of the program) it is.
     4. The HookPoint should know which variables are available to it.
        Calling instance's reference is a special case.
     5. Based on name of a HookPoint and/or properties and occurence
        in the workflow it should be possible to add any custom hook
        (which is a callable) to the HookPoint.
     6. It should be possible to query the HookPoint, which hooks does
       it contain and to change the order if required.



  8. Given access to science-parse server and data
     - Science Parse is a library which parses scientific (pdf)
       documents and extracts data from them.
     - PDFs are not designed to be parsed and are not really structured
       as such. Science Parse uses a Machine Learning model to operate on them.
     - Science Parse is accessible via a server and it returns the
       data in JSON format.
     - Given access to such a server you have to build an application that:
       1. Clean the data:
          - There might be spelling errors and such in the data.
          - References have to be crosschecked from a reliable source like DBLP
          - Some data may need to be discarded.
       2. Store in an accessible format and index it
          - Separate the text and references.
          - Index using whoosh
       3. Generate and visualize graphs for:
          - References
          - Authors



  9. Fake News
     FEVER is a large dataset which has assertions and supporting statements
     see http://fever.ai/
     TBD



  10. Fake News
      SImilar task to above but either a different processing step to
      same dataset or a different dataset
      TBD



  11. Fabric upgrade
      Fabric is a system administration tool which lets you execute a
      single task on multiple machines connected to the network.
      Some time back I had written a fabric script which would
      1. Try to wake up all the machines in AI Lab
      2. Get all the running machines in the AI Lab
      3. Return a list of all the machines which are not running.
      4. Perform a given task which is written in a function to all
         the running machines in parallel
      5. Store the output in a dictionary.
      However, the fabric library has upgraded and the script doesn't
      work now. You have to upgrade it to get it work.


Note :Further details are available here https://groups.google.com/d/forum/cgfm2018


Previous Year Answers  Important Links

https://web.cs.wpi.edu/~cs539/s99/practice_exam1.html

https://web.cs.wpi.edu/~cs539/s99/solutions_practice_exam1.html 

http://www.timvanerven.nl/wp-content/uploads/teaching/ml0708/exercises/mlex2-answers.pdf

No comments:

Post a Comment

Behavior Recognition System Based on Convolutional Neural Network

Our this article is on this  research paper .  Credit : Bo YU What we will do ? We build a set of human behavior recognition syste...