Humans use their faces, hands and body as an integral part of their communication with others. For the computer to interact intelligently with human users, computers should be able to recognize emotions, by analyzing the human's affective state, physiology and behavior. Multimodal interfaces allow humans to interact with machines through multiple modalities such as speech, facial expression, gesture, and gaze. In this paper, we present an overview of research conducted on face and body gesture analysis and recognition. In order to make human-computer interfaces truly natural, we need to develop technology that tracks human movement, body behavior and facial expression, and interprets these movements in an affective way. Accordingly, in this paper we present a vision-based framework that combines face and body gesture for multimodal HCI.