In the practice of clinical gastrointestinal endoscopy, precise estimation of the size of a lesion/finding, such as a polyp, is quintessential in diagnosis, e.g. risk estimation for malignancy. However, various studies confirmed that endoscopic assessment of lesion size has inherent limitations and significant measurement errors. Image-based methods proposed for in-vivo-size measurements, rely on reference objects such as the endoscopic biopsy forceps. The aforementioned problem becomes more challenging in the field of capsule endoscopy, as capsules lack navigation and/or biopsy capabilities. To cope with this problem, we propose a methodology that requires only an endoscopic image - without any need for a reference object - in order to estimate the size of an object of interest in it. The first step in this methodology requires the user to define a linear segment within the image. Then, it takes into consideration the intrinsic parameters of the camera, to project known 3D points on the 2D image plane. With known 3D to 2D point correspondences, in order to perform a measurement, a rough approximation of the distance between the object of interest and the camera is needed. For this purpose, a convolutional neural network is utilized which generates depth maps from monocular images. The proposed methodology is validated by experimentation performed in a 3D printed model of the human colon. The results show that it is feasible to measure the size of various objects in endoscopic images with a mean absolute error of 1.10 mm ± 0.89 mm.