Como analisar documentos PDF com a macro ONLYOFFICE
No ambiente digital acelerado de hoje, escritores, editores e criadores de conteúdo muitas vezes têm dificuldade em obter insights significativos sobre seus documentos. Compreender métricas como legibilidade, frequência de palavras e equilíbrio estrutural pode melhorar significativamente a qualidade do documento, mas a análise manual é demorada e inconsistente. Neste post do blog, mostraremos como criar uma macro poderosa do ONLYOFFICE que analisa seus documentos automaticamente e gera relatórios abrangentes.
Construindo a macro de análise de documentos
Vamos dividir nossa macro em componentes funcionais e explicar como cada parte funciona.
Configurando a função principal
O núcleo da nossa macro é a função analyzeDocument(), que orquestra todo o processo de análise:
function analyzeDocument() {
try {
// Get document and all text
var oDocument = Api.GetDocument();
var allText = "";
var paragraphs = oDocument.GetAllParagraphs();
// Check if document is empty
if (paragraphs.length === 0) {
console.log("Warning: Document is empty or no paragraphs found for analysis.");
return;
}
// Collect all text
paragraphs.forEach(function(paragraph) {
allText += paragraph.GetText() + " ";
});
// Perform analyses
var stats = calculateBasicStats(allText, paragraphs);
var advancedStats = calculateAdvancedStats(allText, stats);
var commonWords = findCommonWords(allText, 10);
// Create report
createAndAddReport(oDocument, stats, advancedStats, commonWords);
// Log success
console.log("Success: Document analysis completed. Report added to the end of the document.");
} catch (error) {
console.log("Error: " + error.message);
}
}
Esta função primeiro coleta todo o texto do documento, depois o passa para funções de análise especializadas e, por fim, cria um relatório. O bloco try-catch garante que a macro trate quaisquer erros sem problemas.
Calculando estatísticas básicas
A função calculateBasicStats() processa o texto para extrair métricas fundamentais:
function calculateBasicStats(text, paragraphs) {
// Word count
var words = text.split(/\s+/).filter(function(word) {
return word.length > 0;
});
var wordCount = words.length;
// Sentence count
var sentences = text.split(/[.!?]+/).filter(function(sentence) {
return sentence.trim().length > 0;
});
var sentenceCount = sentences.length;
// Paragraph count
var paragraphCount = paragraphs.length;
// Character count
var charCountWithSpaces = text.length;
var charCountWithoutSpaces = text.replace(/\s+/g, "").length;
// Line count (approximate)
var lineCount = Math.ceil(charCountWithSpaces / 70);
return {
wordCount: wordCount,
sentenceCount: sentenceCount,
paragraphCount: paragraphCount,
charCountWithSpaces: charCountWithSpaces,
charCountWithoutSpaces: charCountWithoutSpaces,
lineCount: lineCount,
words: words,
sentences: sentences
};
}
Esta função divide o texto em palavras e frases, conta parágrafos e calcula contagens de caracteres e linhas.
Executando análises avançadas
Para insights mais profundos, a função calculateAdvancedStats() calcula métricas mais sofisticadas:
function calculateAdvancedStats(text, basicStats) {
// Average sentence length
var avgWordsPerSentence = basicStats.wordCount / Math.max(1, basicStats.sentenceCount);
// Average paragraph length
var avgWordsPerParagraph = basicStats.wordCount / Math.max(1, basicStats.paragraphCount);
// Average word length
var totalWordLength = basicStats.words.reduce(function(sum, word) {
return sum + word.length;
}, 0);
var avgWordLength = totalWordLength / Math.max(1, basicStats.wordCount);
// Readability score (simplified Flesch-Kincaid)
var readabilityScore = 206.835 - 1.015 * avgWordsPerSentence - 84.6 * (totalWordLength / basicStats.wordCount);
// Estimated reading time
var readingTimeMinutes = Math.ceil(basicStats.wordCount / 200);
return {
avgWordsPerSentence: avgWordsPerSentence,
avgWordsPerParagraph: avgWordsPerParagraph,
avgWordLength: avgWordLength,
readabilityScore: readabilityScore,
readingTimeMinutes: readingTimeMinutes
};
}
Isso calcula o comprimento médio das frases e parágrafos, as pontuações de legibilidade e o tempo estimado de leitura.
Analisando a frequência das palavras
A função findCommonWords() identifica as palavras mais frequentemente usadas:
function findCommonWords(text, limit) {
// Clean text and convert to lowercase
var cleanText = text.toLowerCase().replace(/[.,\/#!$%\^&\*;:{}=\-_`~()]/g, "");
// Split into words
var words = cleanText.split(/\s+/).filter(function(word) {
return word.length > 3;
});
// Calculate word frequencies
var wordFrequency = {};
words.forEach(function(word) {
wordFrequency[word] = (wordFrequency[word] || 0) + 1;
});
// Filter stop words
var stopWords = ["this", "that", "with", "from", "have", "been"];
stopWords.forEach(function(stopWord) {
delete wordFrequency[stopWord];
});
// Sort by frequency
var sortedWords = Object.keys(wordFrequency).sort(function(a, b) {
return wordFrequency[b] - wordFrequency[a];
});
// Return top N words
return sortedWords.slice(0, limit).map(function(word) {
return { word: word, frequency: wordFrequency[word] };
});
}
function findCommonWords(text, limit) {
// Clean text and convert to lowercase
var cleanText = text.toLowerCase().replace(/[.,\/#!$%\^&\*;:{}=\-_`~()]/g, "");
// Split into words
var words = cleanText.split(/\s+/).filter(function(word) {
return word.length > 3;
});
// Calculate word frequencies
var wordFrequency = {};
words.forEach(function(word) {
wordFrequency[word] = (wordFrequency[word] || 0) + 1;
});
// Filter stop words
var stopWords = ["this", "that", "with", "from", "have", "been"];
stopWords.forEach(function(stopWord) {
delete wordFrequency[stopWord];
});
// Sort by frequency
var sortedWords = Object.keys(wordFrequency).sort(function(a, b) {
return wordFrequency[b] - wordFrequency[a];
});
// Return top N words
return sortedWords.slice(0, limit).map(function(word) {
return { word: word, frequency: wordFrequency[word] };
});
}
Esta função remove pontuação, filtra palavras de preenchimento comuns e retorna as palavras mais frequentemente usadas no documento.
Gerando o relatório
Finalmente, a função createAndAddReport() compila e formata todos os resultados da análise:
function createAndAddReport(oDocument, basicStats, advancedStats, commonWords) {
// Add new page
var oParagraph = Api.CreateParagraph();
oParagraph.AddPageBreak();
oDocument.AddElement(oDocument.GetElementsCount(), oParagraph);
// Add title
var oHeading = Api.CreateParagraph();
oHeading.AddText("DOCUMENT ANALYSIS REPORT");
oDocument.AddElement(oDocument.GetElementsCount(), oHeading);
// Add basic statistics section
var oSubHeading = Api.CreateParagraph();
oSubHeading.AddText("BASIC STATISTICS");
oDocument.AddElement(oDocument.GetElementsCount(), oSubHeading);
// Add statistics content
// ... (code that adds individual statistics)
// Add advanced analysis section
// ... (code that adds advanced metrics)
// Add word frequency section
// ... (code that adds word frequency list)
// Add footer
var oFootnotePara = Api.CreateParagraph();
oFootnotePara.AddText("This report was generated by OnlyOffice Document Statistics and Analysis Tool on " +
new Date().toLocaleString() + ".");
oDocument.AddElement(oDocument.GetElementsCount(), oFootnotePara);
}
Esta função cria um relatório estruturado no final do documento com todos os resultados da análise.
Código de macro completo
Aqui está o código de macro completo que você pode copiar e usar:
(function() {
// Main function - starts all operations
function analyzeDocument() {
try {
// Get document and all text
var oDocument = Api.GetDocument();
var allText = "";
var paragraphs = oDocument.GetAllParagraphs();
// Check if document is empty
if (paragraphs.length === 0) {
console.log("Warning: Document is empty or no paragraphs found for analysis.");
return;
}
// Collect all text
paragraphs.forEach(function(paragraph) {
allText += paragraph.GetText() + " ";
});
// Calculate basic statistics
var stats = calculateBasicStats(allText, paragraphs);
// Perform advanced analysis
var advancedStats = calculateAdvancedStats(allText, stats);
// Find most common words
var commonWords = findCommonWords(allText, 10);
// Create and add report to the document
createAndAddReport(oDocument, stats, advancedStats, commonWords);
// Inform user
console.log("Success: Document analysis completed. Report added to the end of the document.");
} catch (error) {
console.log("Error: An error occurred during processing: " + error.message);
}
}
// Calculate basic statistics
function calculateBasicStats(text, paragraphs) {
// Word count
var words = text.split(/\s+/).filter(function(word) {
return word.length > 0;
});
var wordCount = words.length;
// Sentence count
var sentences = text.split(/[.!?]+/).filter(function(sentence) {
return sentence.trim().length > 0;
});
var sentenceCount = sentences.length;
// Paragraph count
var paragraphCount = paragraphs.length;
// Character count (with and without spaces)
var charCountWithSpaces = text.length;
var charCountWithoutSpaces = text.replace(/\s+/g, "").length;
// Line count (approximate)
var lineCount = Math.ceil(charCountWithSpaces / 70); // Approximately 70 characters/line
return {
wordCount: wordCount,
sentenceCount: sentenceCount,
paragraphCount: paragraphCount,
charCountWithSpaces: charCountWithSpaces,
charCountWithoutSpaces: charCountWithoutSpaces,
lineCount: lineCount,
words: words,
sentences: sentences
};
}
// Calculate advanced statistics
function calculateAdvancedStats(text, basicStats) {
// Average sentence length (in words)
var avgWordsPerSentence = basicStats.wordCount / Math.max(1, basicStats.sentenceCount);
// Average paragraph length (in words)
var avgWordsPerParagraph = basicStats.wordCount / Math.max(1, basicStats.paragraphCount);
// Average word length (in characters)
var totalWordLength = basicStats.words.reduce(function(sum, word) {
return sum + word.length;
}, 0);
var avgWordLength = totalWordLength / Math.max(1, basicStats.wordCount);
// Readability score (simplified Flesch-Kincaid)
var readabilityScore = 206.835 - 1.015 * (basicStats.wordCount / Math.max(1, basicStats.sentenceCount)) - 84.6 * (totalWordLength / Math.max(1, basicStats.wordCount));
// Estimated reading time (minutes)
var readingTimeMinutes = Math.ceil(basicStats.wordCount / 200); // Average reading speed 200 words/minute
return {
avgWordsPerSentence: avgWordsPerSentence,
avgWordsPerParagraph: avgWordsPerParagraph,
avgWordLength: avgWordLength,
readabilityScore: readabilityScore,
readingTimeMinutes: readingTimeMinutes
};
}
// Find most common words
function findCommonWords(text, limit) {
// Clean text and convert to lowercase
var cleanText = text.toLowerCase().replace(/[.,\/#!$%\^&\*;:{}=\-_`~()]/g, "");
// Split into words
var words = cleanText.split(/\s+/).filter(function(word) {
return word.length > 3; // Filter out very short words
});
// Calculate word frequencies
var wordFrequency = {};
words.forEach(function(word) {
if (wordFrequency[word]) {
wordFrequency[word]++;
} else {
wordFrequency[word] = 1;
}
});
// Filter stop words (common English words)
var stopWords = ["this", "that", "these", "those", "with", "from", "have", "been", "were", "they", "their", "what", "when", "where", "which", "there", "will", "would", "could", "should", "about", "also"];
stopWords.forEach(function(stopWord) {
if (wordFrequency[stopWord]) {
delete wordFrequency[stopWord];
}
});
// Sort by frequency
var sortedWords = Object.keys(wordFrequency).sort(function(a, b) {
return wordFrequency[b] - wordFrequency[a];
});
// Take top N words
var topWords = sortedWords.slice(0, limit);
// Return results as word-frequency pairs
return topWords.map(function(word) {
return {
word: word,
frequency: wordFrequency[word]
};
});
}
// Create and add report to document
function createAndAddReport(oDocument, basicStats, advancedStats, commonWords) {
// Add new page
var oParagraph = Api.CreateParagraph();
oParagraph.AddPageBreak();
oDocument.AddElement(oDocument.GetElementsCount(), oParagraph);
// Main title - highlighting in capital letters
var oHeading = Api.CreateParagraph();
oHeading.AddText("DOCUMENT ANALYSIS REPORT");
oDocument.AddElement(oDocument.GetElementsCount(), oHeading);
// Subheading - in capital letters
var oSubHeading = Api.CreateParagraph();
oSubHeading.AddText("BASIC STATISTICS");
oDocument.AddElement(oDocument.GetElementsCount(), oSubHeading);
// Add basic statistics
var oStatsPara = Api.CreateParagraph();
oStatsPara.AddText("• Word Count: " + basicStats.wordCount);
oDocument.AddElement(oDocument.GetElementsCount(), oStatsPara);
oStatsPara = Api.CreateParagraph();
oStatsPara.AddText("• Sentence Count: " + basicStats.sentenceCount);
oDocument.AddElement(oDocument.GetElementsCount(), oStatsPara);
oStatsPara = Api.CreateParagraph();
oStatsPara.AddText("• Paragraph Count: " + basicStats.paragraphCount);
oDocument.AddElement(oDocument.GetElementsCount(), oStatsPara);
oStatsPara = Api.CreateParagraph();
oStatsPara.AddText("• Character Count (with spaces): " + basicStats.charCountWithSpaces);
oDocument.AddElement(oDocument.GetElementsCount(), oStatsPara);
oStatsPara = Api.CreateParagraph();
oStatsPara.AddText("• Character Count (without spaces): " + basicStats.charCountWithoutSpaces);
oDocument.AddElement(oDocument.GetElementsCount(), oStatsPara);
oStatsPara = Api.CreateParagraph();
oStatsPara.AddText("• Estimated Line Count: " + basicStats.lineCount);
oDocument.AddElement(oDocument.GetElementsCount(), oStatsPara);
// Advanced analysis title
oSubHeading = Api.CreateParagraph();
oSubHeading.AddText("ADVANCED ANALYSIS");
oDocument.AddElement(oDocument.GetElementsCount(), oSubHeading);
// Add advanced analysis results
oStatsPara = Api.CreateParagraph();
oStatsPara.AddText("• Average Sentence Length: " + advancedStats.avgWordsPerSentence.toFixed(2) + " words");
oDocument.AddElement(oDocument.GetElementsCount(), oStatsPara);
oStatsPara = Api.CreateParagraph();
oStatsPara.AddText("• Average Paragraph Length: " + advancedStats.avgWordsPerParagraph.toFixed(2) + " words");
oDocument.AddElement(oDocument.GetElementsCount(), oStatsPara);
oStatsPara = Api.CreateParagraph();
oStatsPara.AddText("• Average Word Length: " + advancedStats.avgWordLength.toFixed(2) + " characters");
oDocument.AddElement(oDocument.GetElementsCount(), oStatsPara);
oStatsPara = Api.CreateParagraph();
oStatsPara.AddText("• Readability Score: " + advancedStats.readabilityScore.toFixed(2));
oDocument.AddElement(oDocument.GetElementsCount(), oStatsPara);
oStatsPara = Api.CreateParagraph();
oStatsPara.AddText("• Estimated Reading Time: " + advancedStats.readingTimeMinutes + " minutes");
oDocument.AddElement(oDocument.GetElementsCount(), oStatsPara);
// Common words title
oSubHeading = Api.CreateParagraph();
oSubHeading.AddText("MOST FREQUENTLY USED WORDS");
oDocument.AddElement(oDocument.GetElementsCount(), oSubHeading);
// We'll create a simple list instead of a table
if (commonWords.length > 0) {
for (var i = 0; i < commonWords.length; i++) {
var oWordPara = Api.CreateParagraph();
oWordPara.AddText((i + 1) + ". " + commonWords[i].word + " (" + commonWords[i].frequency + " times)");
oDocument.AddElement(oDocument.GetElementsCount(), oWordPara);
}
} else {
var oNoneFoundPara = Api.CreateParagraph();
oNoneFoundPara.AddText("No frequently used words found.");
oDocument.AddElement(oDocument.GetElementsCount(), oNoneFoundPara);
}
// Footer note
var oFootnotePara = Api.CreateParagraph();
oFootnotePara.AddText("This report was generated by OnlyOffice Document Statistics and Analysis Tool on " +
new Date().toLocaleString() + ".");
oDocument.AddElement(oDocument.GetElementsCount(), oFootnotePara);
}
// Run the macro
analyzeDocument();
})();
Para usar esta macro no ONLYOFFICE
- Abra seu documento no ONLYOFFICE
- Navegue até a guia Exibir e selecione Macros
- Crie uma nova macro e cole o código
- Execute a macro
- A detailed analysis report will be added to the end of your document
Agora vamos executar nossa macro e ver como ela funciona!
Esta macro é uma ferramenta valiosa para profissionais que buscam automatizar processos de análise e documentação de texto em um ambiente de escritório moderno. Esperamos que seja uma adição útil ao seu kit de ferramentas de trabalho.
Recomendamos que você explore a documentação da API do ONLYOFFICE para criar suas próprias macros personalizadas ou aprimorar esta. Se tiver ideias para melhorias ou sugestões para novas macros, não hesite em nos contatar aqui. Seu feedback nos ajuda a continuar desenvolvendo ferramentas que tornam a criação e a edição de documentos mais eficientes.
Sobre o autor
Crie sua conta gratuita no ONLYOFFICE
Visualize, edite e colabore em documentos, planilhas, slides, formulários e arquivos PDF online.