For example, if the original video A & B are both a person holding a mic singing. After extracting the pics, there will be some pic that the faces (both A&B) are partially blocked by mic or the other waving hand. Other examples like people are eating, and there will have some sending-food-to-mouth pics. Should I still use these pic for training to show the eating actions? Or I only need those pics which include the full face without blocking from different angles?
Thanks.