Geograph British Isles :: Academic & Research Image DataSets

Large number of user-contributed, and geolocated, geographical themed images of Great Britain and Ireland. Presented as reusable datasets for Academic research, ideally suited for Computer Vision projects.

Links | Data Dumps | Facets | Geograph API | Contact Us |

Creative 
Commons Licence [Some Rights Reserved] All datasets on this page © Copyright Geograph Project
and licensed for reuse under this Creative Commons Licence.
You are free:
 to Share - to copy, distribute and transmit the work
 to Remix - to adapt the work
under the following conditions:
 Attribution - You must attribute the work in the manner specified by the author or licensor
 Share Alike - If you alter, transform, or build upon this work, you may distribute the resulting work only under the same or similar license to this one.

Notable Datasets Available

⚠ We are not providing the all the direct download links here, Contact Us if you want to download one of these datasets that are not linked; Or please Fill out this form if want a bespoke dataset.

A. ScenicOrNot Dataset

Images originally selected 12th Feb 2009, and consists of 212,057 images (Aiming for one photo per square, in Great Britain only). ~22Gb - 640px nominal resolution

Copy of the images provided to and rated by the ScenicOrNot project. The specific dataset includes images matched to a static snapshot of ScenicOrNot votes from 27th July 2014. We do also have a list of all images originally (about 1M) supplied to SoN, but they were responsible for selecting final images to featuring in project (ie choosing one image per square).

Has been used in a number of studies over time - see http://scenicornot.datasciencelab.co.uk/faq

B. Geograph London Images

Dump created 19th Oct 2016, consisting of 270,570 images in the Greater-London area

Selection of most Geograph classified images in a arbitary bounding-box around Greater London.

Used here: https://datadryad.org/resource/doi:10.5061/dryad.rq4s3 (the London-Scenic-Predictions.csv file, provides predictions for images from this dataset) and plotted on map: https://www.geograph.org.uk/blog/272

C. Geograph Machine Learning Dataset 1 - beauty

Dump created 4th Aug 2017, consisting of 58,210 images, from our own 'Showcase Gallery' ~43Gb - most images larger than 1000px

Our Showcase Gallery allows users have vote on images, roughly on 'quality' to surface nice photos. A pretty arbitary cross-section of images, biasied towards higher quality (as they get nominated in the first place), but includes low voted images too. Includes a snapshot of the actual votes at that time.

(work on this dataset, as yet unpublished)

D. Geograph Machine Learning Dataset 2 - location based

Dump created 4th Dec 2021, currently consisting of 100k images, randomly spread over the whole of Britain and Ireland. Images resized&cropped to 255x255px.

We pretty much randomly selected images spread from all over to have a good selection. Current sample is only 100k images as a prototype, intend to release more selections if works out. We pre-downsized and cropped images to 255x255 px square, as seems a common size of vision processing, while minimising the size of the dataset.

If just want to preview it, download a Tiny sample of just 10 images (226kb .zip). Download the full zip files via dataset download page (about 220Mb per 10k images).

Note: the downloads come with a metadata.csv file containing the contributor name, and other details like title/date and lat/long, as need to credit the photographer when reproducing images. The file is deliberately minimalist, download more data about images from our data dumps.

E. Geograph Vision Datasets - Labelled Training data

We have produced a range of datasets with images arranged in folders, intended for transfer learning. Mainly in liner.ai. Images have been cropped + resized to 224x224 square, as liner would do this anyway, as its using with EfficientNet/MobileNet.

Geograph has lots of pre-lablled data, just doesnt have the labeles for ALL images. For example, only some contributor apply tags to images. So we where experimenting if could train our own models using these data, to potentially suggest labels for other images.

Labels include 'tag', 'context', 'subject', 'category', 'place' (could potentially identify notable places from image!) and 'type' (which is our own moderation classification). About 3M images have been setup in these training datasets. Contact Us if interest in gaining acceess, or in particular would be interested in developing working models on these (that then willing to share!).

F. Geograph text Datasets - Labelled Training data

On a much smaller scale, but we also create some textual datasets, for testing transfer learning with liner.ai.

For example, testing if could predict a 'subject' classification based on the text (title/description) of an image. Again contact us if interested in developing this.



Contact Us if have any questions about the above datasets, or wish to download the datasets where we haven't provided the link above.


Remember these are just a selection of static snapshots created, we have many more images, and can produce bespoke datasets on request