Tuesday, November 26, 2019

Behavior Recognition System Based on Convolutional Neural Network

Our this article is on this research paper . 
Credit : Bo YU

What we will do ?

We build a set of human behavior recognition system based on the convolution neural network constructed for the specific human behavior in public places. 

(i) : Firstly, video of human behavior data set will be segmented into images, then we process the images by the method of background subtraction to extract moving foreground characters of body. (ii). Secondly, the training data sets are trained into the designed convolution neural network, and the depth learning network is constructed by stochastic gradient descent.
(iii). Finally, the various behaviors of samples are classified and identified with the obtained network model, and the recognition results are compared with the current mainstream methods. The result show that the convolution neural network can study human behavior model automatically and identify human’s behaviors without any manually annotated trainings. 

Human behavior recognition is mainly divided into two processes: the identification and understanding of human behavior feature extraction and motion .

This algorithm is mainly composed of three parts, 
1.  Video pretreatment, 2. Model training 3. Behavior recognition part. 


In the video preprocessing part, firstly the original behavior of video preprocessing, using block updating background subtraction method to achieve target detection, two value image motion information is extracted, then the image input channel convolutional neural network, through the iterative training parameters of the network, to construct a model for convolution Behavior Recognition . Finally, you can use this network to identify human behavior .




Implementation Code on Github : Code


Thursday, November 14, 2019

Frame Extraction From Multiple Videos @x fps (Frame Per Second) In Python

Our this article is extension of previous article .In this article we will extract frame from multiple videos having same length duration .

We are assuming the following conditions :

1. All Videos should have same length duration .

In Our Below code you only need to give path location of videos .

#!/usr/bin/env python3
# -*- coding: utf-8 -*-
"""
Created on Thu Nov 14 09:47:58 2019

@author: Jo
"""
#Doing with open cv2
#Import Cv2 library
import numpy as np
import cv2,os
from moviepy.editor import VideoFileClip

#location of the input files
inputfilepaths=["Dataset/big1.mp4","Dataset/big2.mp4","Dataset/big3.mp4","Dataset/big4.mp4"
                ,"Dataset/big5.mp4","Dataset/big6.mp4","Dataset/big7.mp4"]

#For saving output 
if not os.path.exists('Frames'):
    os.makedirs('Frames')

video=[]
#location of video file
for x in range(0,len(inputfilepaths)):
    video.append(cv2.VideoCapture(inputfilepaths[x]))
    print(x+1,"Video length ",VideoFileClip(inputfilepaths[0]).duration,"seconds")

def extractframe(sec):
    # cap.set(cv2.CAP_PROP_POS_MSEC,sec*1000) is responsible for
    #skipping directly to the sec in the video (sec*1000th millisecond)
  
        #reading frame
    hasframes=np.array([])

    for x in range(0,len(video)):
        video[x].set(cv2.CAP_PROP_POS_MSEC,sec*1000)
        hasimage,images=video[x].read()
        hasframes=np.append(hasframes,hasimage)
     
    if hasframes[len(hasframes)-1]:
        #Write to location , increasing the count to avoid name conflict of immges
        for x in range(0,len(video)):
             hasimage,images=video[x].read()
             cv2.imwrite("Frames/{0}video{1}.jpg".format(x+1,count), images)

    return hasframes[len(hasframes)-1]

#starting from 0th second
sec = 0
#Setting fps , here it will capture image in each 0.5 second , 2fps
frameRate = 0.5
count=1

#to check whether frame are their in video or not.
success = extractframe(sec)
while success:
    #increasing counter to name conflick
    count = count + 1
    #setting sec
    sec = sec + frameRate
    sec = round(sec, 2)
    print(sec)
    success = extractframe(sec)
 

*----------------------------------------------------------      
Github Link : 

Frame Extraction From Video @x fps (Frame Per Second) In Python

Video is collection of Frames . In this article we will know how to extract frames from video @x fps(Frame per second) using Python . 

A frame is one of the many still images which compose the complete moving picture . When the moving picture is displayed, each frame is flashed on a screen for a short time (nowadays, usually 1/24, 1/25 or 1/30 of a second) and then immediately replaced by the next one

The frame is also sometimes used as a unit of time, so that a momentary event might be said to last six frames, the actual duration of which depends on the frame rate of the system, which varies according to the video or film standard in use. In North America and Japan, 30 frames per second (fps) is the broadcast standard, with 24 frames/s now common in production for high-definition video shot to look like film. In much of the rest of the world, 25 frames/s is standard.

Below code is extracting frame from video @ 2 fps .

#!/usr/bin/env python3
# -*- coding: utf-8 -*-
"""
Created on Thu Nov 14 09:47:58 2019

@author: Jo
"""
#Doing with open cv2
#Import Cv2 library
import cv2,os
from moviepy.editor import VideoFileClip

inputfilepath="Dataset/vid1.mov"

#For saving output 
if not os.path.exists('Frames'):
    os.makedirs('Frames')

#location of video file
video=cv2.VideoCapture(inputfilepath)

#For Getting Clip duration
clip = VideoFileClip(inputfilepath)
print(clip.duration,"seconds")

def extractframe(sec):
    # cap.set(cv2.CAP_PROP_POS_MSEC,sec*1000) is responsible for
    #skipping directly to the sec in the video (sec*1000th millisecond)
    video.set(cv2.CAP_PROP_POS_MSEC,sec*1000)
    #reading frame
    hasframes,image = video.read()
       
    if hasframes:
        #Write to location , increasing the count to avoid name conflict of images
        #
        cv2.imwrite("Frames/image"+str(count)+".jpg", image)  # save frame as JPG file
      
    return hasframes

#starting from 0th second
sec = 0

#Setting fps , here it will capture image in each 0.5 second , 2fps
frameRate = 0.5
count=1

#to check whether frame are their in video or not.
success = extractframe(sec)
while success:
    #increasing counter to name conflick
    count = count + 1
    #setting sec
    sec = sec + frameRate
    sec = round(sec, 2)
    print(sec)
    success = extractframe(sec)

      
Code at Github
*********************************************
Next Article :
Frame Extraction From Multiple Videos @x fps (Frame Per Second) In Python

Tuesday, November 12, 2019

The WILDTRACK Multi-Camera Person Dataset


 Our this article is on this research paper . For more click here .

Abstract of the paper

People detection methods are highly sensitive to the perpetual occlusion among the targets . As multi-camera set-ups become more frequently encountered , joint exploitation of the across views information would allow for improved detection performances . We provides a large-scale HD dataset named WILDTRACK which finally makes advanced deep learning methods applicable to this problem .

In summary , we overview existing , multi-camera datasets and detection methods , enumerate details of our dataset , and we benchmark multi-camera state of the art detectors on this new dataset .

Introduction

Pedestrian detection is sub-category of object detection . Despite the remarkable recent advances , notably lately owning to the integration of the deep learning methods , the performance of these monocular detectors remains limited to medium level occluded applications at the maximum . This statement is legitimate , since given the monocular observation , the underlying cause , in our case the persons to identify , under highly occluded scenes is ambiguous . 

Genuinely, multi-camera detectors come at hand . In general , simple averaging of the per-view predictions , can only improve upon a single view detector . Further, more sophisticated methods jointly make use of the information to yield a prediction . 

In summary :

1. Provided dataset  larger scale HD dataset advantages are : 
-Multi-View detection
-Monocular detection
-Camera calibration

2. We provide experimental benchmark results on this dataset of state of the art multi-camera detection methods .

3. We give an overview of the existing methods and datasets and we discuss research directions .
 
Reference
https://www.epfl.ch/labs/cvlab/data/data-wildtrack/

Thursday, October 31, 2019

Highlight Creation with using OpenCv For ATM Videos

Our this article is successive of   Our Previous Article In previous article we have talked about generating the video highlight using short term energy approach . But for every video we cannot generate the highlight  using that approach because in short term energy approach , we need  audio in a video . Many videos like ATM videos or CCTV footage does not have audio .


By using OpenCV  we can generate highlight of those videos . Here our objective is to detect human from web camera and make video highlight .

We are using following :
1. Python Programming Language
2. Open CV Library
3. Spyder IDE
4. Inbuilt web camera

Our Approach 

1. We are using Haar Cascade Classifier to detect human face from web camera   
2. We are writing the camera feed into small clips in which human faces are detecting and ignoring other frames. We are saving those clips into a folder . 
3. At last we are merging all our clips to generate highlight .

Our Program is running on following system configuration

1. Intel i7 Processor
2. 8GB Ram
3. Window 8.1
4. OpenCV 4.4.1
5. Spyder 3.3.6


The Code with explanation is Here

References



Friday, October 18, 2019

Human Detection with Open CV


Human Detection is type of Object Detection in Computer Vision .

Image credit : Google.com


What is Object Detection ?

Object Detection is a computer technology related to computer vision and image processing that deals with detecting instances of semantics objects of a certain . (Wikipedia Definition)

 This task involves both identification of the presence of the objects and identification of the rectangular boundary surrounding each object (i.e. Object Localisation).

 An object detection system which can detect the class “Human” can work as a Human Detection System .

We can detect human using following algorithm 

1. Haar cascade  ( Research paper Haar Cascade )
2. HOG based approaches 


1. Haar Cascade Approach : 

This is proposed by Paul Viola and Michael Jones in their paper “Rapid Object Detection using a Boosted Cascade of Simple Features published in 2001. This approach is widely used for Face Detection .

More About Haar Cascade


2. Histograms of Oriented Gradients for Human Detection
This is proposed by N. Dalal and B. Triggs in their paper “Histograms of oriented gradients for human detection” published in 2005.


Thursday, October 17, 2019

Video Highlight Creation Of massive video feeds

You see videos in daily life . In this article we will talk about how to create video highlight .


Highlight means focusing on main events in the Video . As in sports videos , highlight means to points out distilling the most key , salient and interesting parts from video . As in 50-50 over cricket  match , highlight means to generate highlight of events as wickets falling , boundaries , catches , run-outs , umpire decision etc .
Highlight Generation is the process of extracting the most interesting clips from a video .
Basic Idea : Whenever an interesting events occurs , there is an increase in the voice as well as the spectators .  

There are many approaches to generate Video highlight . It will depend on the problem domain , which technique we will use .

1. Short Time Energy
The best thing about this approach is you don't need training data for your model . 
Question : What is short time speech ?
Answer : The short time energy is the energy of the short speech segment .

The energy or power of an audio signal refers to the loudness of the sound . It is computed by the sum of the square of the amplitude of an audio signal in the time domain . When energy is computed for a chunk of an entire audio signal , then it is known as Short Time Energy .

 Step By Step Process
  1. Input the Video
  2. Extract the audio
  3. Break the audio into chunks
  4. Compute short-time energy of every chunk
  5. Classify every chunk as excitement or not(based on a threshold value)
  6. Merge all the excitement-clips to form the video highlights
This approach is best for sports videos .

2. Using the Open CV

Suppose we have videos in which no sound is their , like CCTV surveillance camera . Than Above approach will fail . Open CV is used to detect and track the objects .
Suppose we want to make a highlight video from ATM CCTV camera . As in 24 hours videos only some hours transaction happened , first we need to extract those clips from main video .


3.  Using the NLP (Natural Language Processing)

In this approach we convert sounds into text and if we find the important text than we extract that clip .

Here is a step-by-step procedure:
1. Extract the audio from an input video
2. Transcribe the audio to text
3. Apply Extractive based Summarization techniques on text to identify the most important phrases
4. Extract the clips of corresponding important phrases to generate highlights

Behavior Recognition System Based on Convolutional Neural Network

Our this article is on this  research paper .  Credit : Bo YU What we will do ? We build a set of human behavior recognition syste...