With the advance of the Internet of Things (IoT) and artificial intelligence (AI) technology, more and more applications appear for mobile computing, such as face recognition for access control. Yet, most artificial intelligence of things (AIoT) products in our homes are based on cloud service APIs. All the original data are sent wirelessly to the server for cloud computing, which may lead to privacy issues and high energy consumption. In this study, we propose a novel low-cost serverless access control system with multimodal inferring capability. Within the system, practical on-device face recognition and cost-efficient dynamic gesture recognition are implemented for liveness detection. We utilize the MobileNetV2 model for transfer learning the task-specific face data and a random forest model for one-dimensional gesture recognition. Both tiny machine learning (TinyML) models are successfully deployed on a low-cost microcontroller (MCU). The performance for multimodal inferring proves excellent. After using the sequential inferring mechanism, the robustness of the system is further reached. A low-cost prototype is assembled for field tests. The compact all-in-one product based on the proposed system with a cute minion theme has also been designed. Measurements of energy consumption for different MCU operations are also carried out in detail. The system provides a valuable reference for realizing pervasive smart sensing and, thus, contributes to the bright future of AIoT.