An important issue in many forms of collaboration technology is how video can help the technology better meet its goals. This paper explores this question for difficulty awareness, which is motivated by academic and industrial collaboration scenarios in which unsolicited help is offered to programmers in difficulty. We performed experiments to determine how well difficulty can be automatically inferred by mining the interaction log and/or videos of programmers. Our observations show that: (a) it is more effective to mine the videos to detect programmer postures rather than facial features; (b) posturemining benefits from an individual model (training data for a developer is used only for that developer), while in contrast, log-mining benefits from a group model (data of all users are used for each user); (b) posture-mining alone (using an individual model) does not detect difficulties of "calm" programmers, who do not change postures when they are in difficulty; (c) log-mining alone (using a group model) does not detect difficulties of programmers who pause interaction when they are either in difficulty or taking a break; (d) overall, log-mining alone is more effective than posture-mining, alone; (e) both forms of mining have high false negative rates; and (g) multimedia/multimodal detection that mines postures and logs using a group model gives both low false positive and negatives. These results imply that (a) when collaborators can be seen, either directly or through a video, posture changes, though idiosyncratic, are important cues for inferring difficulty; (b) automatically inferred difficulty, using both interaction-logs and postures, when possible and available, is an even more reliable indication of difficulty; (c) video can play an important role in providing unsolicited help in both face-to-face and distributed collaboration; and (d) controlled public environments such as labs and war-rooms should be equipped with cameras that support posture mining.