Is there a way to getmore than 10 lines of text from each word / pdf documents stored in a document library in a site under site collection. I need these 10 lines of text to pass into the NLP tool (Natural Language Processing software)
My thought:
Option1) Using server side code just loop through each document in the library and read the text unto certain size (20kb or 40kb...etc). I know this work for word docs for sure not sure if it works for .pdf files.
Optoion2) Use the OOTB search, in the search box enter * and it will give all the docs and pdf's but in this approach i'm getting only 3 lines of text...researching on this to see if i can change the display templates to get more text.
any help/thoughts much appreciated.
Vijay Ji