The BSU Bangla Dataset is an offline handwriting dataset of Bangla, one of the major scripts in the world. The fundamental objective of this dataset is to foster the offline Bangla handwriting text recognition related researches. The easy availability and simple structure of this dataset are believed to help the research community in developing and testing such recognizers. This dataset is an anonymous and voluntary contribution of many people and the acquisition is still going on. The development of a strong handwritten text recognizer will help to digitally store handwritten archived literature, documents and contribute in digital life automation in many ways such as digital character conversion, meaning translation, content-based image retrieval, keyword spotting, signboard translation, text-to-speech conversion, scene image analysis, postal sorting, etc.
Users are free to share, copy, distribute and use the dataset; to create or produce works from the dataset; to adapt, modify, transform and build upon the dataset as long as the user attributes any public use of the dataset, or works produced from the dataset, referencing the author(s) and DOI link. For any use or redistribution of the dataset, or works produced from it, the user must make clear to others the license of the dataset and keep intact any notices on the original dataset. If users publicly use any adapted version of this dataset, or works produced from an adapted dataset, you must also offer that adapted database under this license. If users redistribute the dataset, or an adapted version of it, then users may use technological measures that restrict the work (such as DRM) as long as users also redistribute a version without such measures. This dataset may not be used for commercial purposes. If interested in commercial licensing, contact (208) 426-5765.
This research is supported by a Graduate Assistantship, awarded to Nishatul Majid, funded by the Graduate College at Boise State University through the Department of Electrical and Computer Engineering.
BOISE STATE UNIVERSITY MAKES NO REPRESENTATIONS ABOUT THE SUITABILITY OF THE INFORMATION CONTAINED IN OR PROVIDED AS PART OF THE SYSTEM FOR ANY PURPOSE. ALL SUCH INFORMATION IS PROVIDED "AS IS" WITHOUT WARRANTY OF ANY KIND. BOISE STATE UNIVERSITY HEREBY DISCLAIMS ALL WARRANTIES AND CONDITIONS WITH REGARD TO THIS INFORMATION, INCLUDING ALL WARRANTIES AND CONDITIONS OF MERCHANTABILITY, WHETHER EXPRESS, IMPLIED OR STATUTORY, FITNESS FOR A PARTICULAR PURPOSE, TITLE AND NON-INFRINGEMENT.
IN NO EVENT SHALL BOISE STATE UNIVERSITY BE LIABLE FOR ANY SPECIAL, INDIRECT OR CONSEQUENTIAL DAMAGES OR ANY DAMAGES WHATSOEVER RESULTING FROM LOSS OF USE, DATA OR PROFITS, WHETHER IN AN ACTION OF CONTRACT, NEGLIGENCE OR OTHER TORTIOUS ACTION, ARISING OUT OF OR IN CONNECTION WITH THE USE OR PERFORMANCE OF INFORMATION AVAILABLE FROM THE SYSTEM.
THE INFORMATION PROVIDED BY THE SYSTEM COULD INCLUDE TECHNICAL INACCURACIES OR TYPOGRAPHICAL ERRORS. CHANGES ARE PERIODICALLY ADDED TO THE INFORMATION HEREIN. COMPANY AND/OR ITS RESPECTIVE SUPPLIERS MAY MAKE IMPROVEMENTS AND/OR CHANGES IN THE PRODUCT(S) AND/OR THE PROGRAM(S) DESCRIBED HEREIN AT ANY TIME, WITH OR WITHOUT NOTICE TO YOU.
BOISE STATE UNIVERSITY DOES NOT MAKE ANY ASSURANCES WITH REGARD TO THE ACCURACY OF THE RESULTS OR OUTPUT THAT DERIVES FROM USE OF THE SYSTEM.
Available at: http://works.bepress.com/elisa_barney_smith/134/
Update Note 09/05/2019: In this release we have 150 pages of handwritten essays and 250 pages of handwritten isolated character documents. Each of these have been acquired both by a handheld cell-phone camera and by a desktop scanner. The same document appears in both the camera and scanned dataset with the same names.
Update Note 02/25/2020: In this release a folder named Conjunct has been added. This folder has 70 pages containing words with the most used conjuncts in Bangla. These are all scanned at 300 dpi. We also added 100 more essay images in the previous dataset, both in cell-phone camera and scanner acquired versions with the ground truth metadata.
Due to the size of this dataset's component files, if you would prefer to access this data via Globus, please email ScholarWorks@boisestate.edu.