Quantcast
Channel: SharePoint 2013 - Development and Programming forum
Viewing all articles
Browse latest Browse all 7589

Extract more than 10 lines of text from each word / PDF documents in a document library.

$
0
0

Is there a way to getmore than 10 lines of text from each word / pdf documents stored in a document library in a site under site collection.  I need these 10 lines of text to pass into the NLP tool (Natural Language Processing software)

My thought:

Option1) Using server side code just loop through each document in the library and read the text unto certain size (20kb or 40kb...etc).  I know this work for word docs for sure not sure if it works for .pdf files.

Optoion2) Use the OOTB search, in the search box enter * and it will give all the docs and pdf's but in this approach i'm getting only 3 lines of text...researching on this to see if i can change the display templates to get more text.

any help/thoughts much appreciated.


Vijay Ji


Viewing all articles
Browse latest Browse all 7589

Trending Articles



<script src="https://jsc.adskeeper.com/r/s/rssing.com.1596347.js" async> </script>