Comment analyser des formulaires PDF avec la macro ONLYOFFICE
Dans l’environnement numérique rapide d’aujourd’hui, les rédacteurs, les éditeurs et les créateurs de contenu ont souvent du mal à obtenir des informations significatives sur leurs documents. La compréhension de paramètres tels que la lisibilité, la fréquence des mots et l’équilibre structurel peut considérablement améliorer la qualité des documents, mais l’analyse manuelle prend du temps et manque de cohérence. Dans cet article de blog, nous allons vous montrer comment créer une macro ONLYOFFICE puissante qui analyse automatiquement vos documents et génère des rapports complets.
Construction de la macro d’analyse de documents
Décomposons notre macro en éléments fonctionnels et expliquons le fonctionnement de chacun d’entre eux.
Configuration de la fonction principale
Le cœur de notre macro est la fonction analyzeDocument(), qui orchestre l’ensemble du processus d’analyse :
function analyzeDocument() {
try {
// Get document and all text
var oDocument = Api.GetDocument();
var allText = "";
var paragraphs = oDocument.GetAllParagraphs();
// Check if document is empty
if (paragraphs.length === 0) {
console.log("Warning: Document is empty or no paragraphs found for analysis.");
return;
}
// Collect all text
paragraphs.forEach(function(paragraph) {
allText += paragraph.GetText() + " ";
});
// Perform analyses
var stats = calculateBasicStats(allText, paragraphs);
var advancedStats = calculateAdvancedStats(allText, stats);
var commonWords = findCommonWords(allText, 10);
// Create report
createAndAddReport(oDocument, stats, advancedStats, commonWords);
// Log success
console.log("Success: Document analysis completed. Report added to the end of the document.");
} catch (error) {
console.log("Error: " + error.message);
}
}
Cette fonction recueille d’abord tout le texte du document, le transmet ensuite à des fonctions d’analyse spécialisées et crée enfin un rapport. Le bloc try-catch permet à la macro de gérer les erreurs de manière gracieuse.
Calcul des statistiques de base
La fonction calculateBasicStats() traite le texte pour en extraire les métriques fondamentales :
function calculateBasicStats(text, paragraphs) {
// Word count
var words = text.split(/\s+/).filter(function(word) {
return word.length > 0;
});
var wordCount = words.length;
// Sentence count
var sentences = text.split(/[.!?]+/).filter(function(sentence) {
return sentence.trim().length > 0;
});
var sentenceCount = sentences.length;
// Paragraph count
var paragraphCount = paragraphs.length;
// Character count
var charCountWithSpaces = text.length;
var charCountWithoutSpaces = text.replace(/\s+/g, "").length;
// Line count (approximate)
var lineCount = Math.ceil(charCountWithSpaces / 70);
return {
wordCount: wordCount,
sentenceCount: sentenceCount,
paragraphCount: paragraphCount,
charCountWithSpaces: charCountWithSpaces,
charCountWithoutSpaces: charCountWithoutSpaces,
lineCount: lineCount,
words: words,
sentences: sentences
};
}
Cette fonction divise le texte en mots et en phrases, compte les paragraphes et calcule le nombre de caractères et de lignes.
Réalisation d’une analyse avancée
Pour obtenir des informations plus détaillées, la fonction calculateAdvancedStats() calcule des mesures plus sophistiquées :
function calculateAdvancedStats(text, basicStats) {
// Average sentence length
var avgWordsPerSentence = basicStats.wordCount / Math.max(1, basicStats.sentenceCount);
// Average paragraph length
var avgWordsPerParagraph = basicStats.wordCount / Math.max(1, basicStats.paragraphCount);
// Average word length
var totalWordLength = basicStats.words.reduce(function(sum, word) {
return sum + word.length;
}, 0);
var avgWordLength = totalWordLength / Math.max(1, basicStats.wordCount);
// Readability score (simplified Flesch-Kincaid)
var readabilityScore = 206.835 - 1.015 * avgWordsPerSentence - 84.6 * (totalWordLength / basicStats.wordCount);
// Estimated reading time
var readingTimeMinutes = Math.ceil(basicStats.wordCount / 200);
return {
avgWordsPerSentence: avgWordsPerSentence,
avgWordsPerParagraph: avgWordsPerParagraph,
avgWordLength: avgWordLength,
readabilityScore: readabilityScore,
readingTimeMinutes: readingTimeMinutes
};
}
Il calcule la longueur moyenne des phrases et des paragraphes, les scores de lisibilité et le temps de lecture estimé.
Analyse de la fréquence des mots
La fonction findCommonWords() identifie les mots les plus fréquemment utilisés :
function findCommonWords(text, limit) {
// Clean text and convert to lowercase
var cleanText = text.toLowerCase().replace(/[.,\/#!$%\^&\*;:{}=\-_`~()]/g, "");
// Split into words
var words = cleanText.split(/\s+/).filter(function(word) {
return word.length > 3;
});
// Calculate word frequencies
var wordFrequency = {};
words.forEach(function(word) {
wordFrequency[word] = (wordFrequency[word] || 0) + 1;
});
// Filter stop words
var stopWords = ["this", "that", "with", "from", "have", "been"];
stopWords.forEach(function(stopWord) {
delete wordFrequency[stopWord];
});
// Sort by frequency
var sortedWords = Object.keys(wordFrequency).sort(function(a, b) {
return wordFrequency[b] - wordFrequency[a];
});
// Return top N words
return sortedWords.slice(0, limit).map(function(word) {
return { word: word, frequency: wordFrequency[word] };
});
}
function findCommonWords(text, limit) {
// Clean text and convert to lowercase
var cleanText = text.toLowerCase().replace(/[.,\/#!$%\^&\*;:{}=\-_`~()]/g, "");
// Split into words
var words = cleanText.split(/\s+/).filter(function(word) {
return word.length > 3;
});
// Calculate word frequencies
var wordFrequency = {};
words.forEach(function(word) {
wordFrequency[word] = (wordFrequency[word] || 0) + 1;
});
// Filter stop words
var stopWords = ["this", "that", "with", "from", "have", "been"];
stopWords.forEach(function(stopWord) {
delete wordFrequency[stopWord];
});
// Sort by frequency
var sortedWords = Object.keys(wordFrequency).sort(function(a, b) {
return wordFrequency[b] - wordFrequency[a];
});
// Return top N words
return sortedWords.slice(0, limit).map(function(word) {
return { word: word, frequency: wordFrequency[word] };
});
}
Cette fonction supprime la ponctuation, filtre les mots de remplissage courants et renvoie les mots les plus fréquemment utilisés dans le document.
Génération du rapport
Enfin, la fonction createAndAddReport() compile et met en forme tous les résultats de l’analyse :
function createAndAddReport(oDocument, basicStats, advancedStats, commonWords) {
// Add new page
var oParagraph = Api.CreateParagraph();
oParagraph.AddPageBreak();
oDocument.AddElement(oDocument.GetElementsCount(), oParagraph);
// Add title
var oHeading = Api.CreateParagraph();
oHeading.AddText("DOCUMENT ANALYSIS REPORT");
oDocument.AddElement(oDocument.GetElementsCount(), oHeading);
// Add basic statistics section
var oSubHeading = Api.CreateParagraph();
oSubHeading.AddText("BASIC STATISTICS");
oDocument.AddElement(oDocument.GetElementsCount(), oSubHeading);
// Add statistics content
// ... (code that adds individual statistics)
// Add advanced analysis section
// ... (code that adds advanced metrics)
// Add word frequency section
// ... (code that adds word frequency list)
// Add footer
var oFootnotePara = Api.CreateParagraph();
oFootnotePara.AddText("This report was generated by OnlyOffice Document Statistics and Analysis Tool on " +
new Date().toLocaleString() + ".");
oDocument.AddElement(oDocument.GetElementsCount(), oFootnotePara);
}
Cette fonction crée un rapport structuré à la fin du document avec tous les résultats de l’analyse.
Code de la macro complet
Voici le code complet de la macro que vous pouvez copier et utiliser :
(function() {
// Main function - starts all operations
function analyzeDocument() {
try {
// Get document and all text
var oDocument = Api.GetDocument();
var allText = "";
var paragraphs = oDocument.GetAllParagraphs();
// Check if document is empty
if (paragraphs.length === 0) {
console.log("Warning: Document is empty or no paragraphs found for analysis.");
return;
}
// Collect all text
paragraphs.forEach(function(paragraph) {
allText += paragraph.GetText() + " ";
});
// Calculate basic statistics
var stats = calculateBasicStats(allText, paragraphs);
// Perform advanced analysis
var advancedStats = calculateAdvancedStats(allText, stats);
// Find most common words
var commonWords = findCommonWords(allText, 10);
// Create and add report to the document
createAndAddReport(oDocument, stats, advancedStats, commonWords);
// Inform user
console.log("Success: Document analysis completed. Report added to the end of the document.");
} catch (error) {
console.log("Error: An error occurred during processing: " + error.message);
}
}
// Calculate basic statistics
function calculateBasicStats(text, paragraphs) {
// Word count
var words = text.split(/\s+/).filter(function(word) {
return word.length > 0;
});
var wordCount = words.length;
// Sentence count
var sentences = text.split(/[.!?]+/).filter(function(sentence) {
return sentence.trim().length > 0;
});
var sentenceCount = sentences.length;
// Paragraph count
var paragraphCount = paragraphs.length;
// Character count (with and without spaces)
var charCountWithSpaces = text.length;
var charCountWithoutSpaces = text.replace(/\s+/g, "").length;
// Line count (approximate)
var lineCount = Math.ceil(charCountWithSpaces / 70); // Approximately 70 characters/line
return {
wordCount: wordCount,
sentenceCount: sentenceCount,
paragraphCount: paragraphCount,
charCountWithSpaces: charCountWithSpaces,
charCountWithoutSpaces: charCountWithoutSpaces,
lineCount: lineCount,
words: words,
sentences: sentences
};
}
// Calculate advanced statistics
function calculateAdvancedStats(text, basicStats) {
// Average sentence length (in words)
var avgWordsPerSentence = basicStats.wordCount / Math.max(1, basicStats.sentenceCount);
// Average paragraph length (in words)
var avgWordsPerParagraph = basicStats.wordCount / Math.max(1, basicStats.paragraphCount);
// Average word length (in characters)
var totalWordLength = basicStats.words.reduce(function(sum, word) {
return sum + word.length;
}, 0);
var avgWordLength = totalWordLength / Math.max(1, basicStats.wordCount);
// Readability score (simplified Flesch-Kincaid)
var readabilityScore = 206.835 - 1.015 * (basicStats.wordCount / Math.max(1, basicStats.sentenceCount)) - 84.6 * (totalWordLength / Math.max(1, basicStats.wordCount));
// Estimated reading time (minutes)
var readingTimeMinutes = Math.ceil(basicStats.wordCount / 200); // Average reading speed 200 words/minute
return {
avgWordsPerSentence: avgWordsPerSentence,
avgWordsPerParagraph: avgWordsPerParagraph,
avgWordLength: avgWordLength,
readabilityScore: readabilityScore,
readingTimeMinutes: readingTimeMinutes
};
}
// Find most common words
function findCommonWords(text, limit) {
// Clean text and convert to lowercase
var cleanText = text.toLowerCase().replace(/[.,\/#!$%\^&\*;:{}=\-_`~()]/g, "");
// Split into words
var words = cleanText.split(/\s+/).filter(function(word) {
return word.length > 3; // Filter out very short words
});
// Calculate word frequencies
var wordFrequency = {};
words.forEach(function(word) {
if (wordFrequency[word]) {
wordFrequency[word]++;
} else {
wordFrequency[word] = 1;
}
});
// Filter stop words (common English words)
var stopWords = ["this", "that", "these", "those", "with", "from", "have", "been", "were", "they", "their", "what", "when", "where", "which", "there", "will", "would", "could", "should", "about", "also"];
stopWords.forEach(function(stopWord) {
if (wordFrequency[stopWord]) {
delete wordFrequency[stopWord];
}
});
// Sort by frequency
var sortedWords = Object.keys(wordFrequency).sort(function(a, b) {
return wordFrequency[b] - wordFrequency[a];
});
// Take top N words
var topWords = sortedWords.slice(0, limit);
// Return results as word-frequency pairs
return topWords.map(function(word) {
return {
word: word,
frequency: wordFrequency[word]
};
});
}
// Create and add report to document
function createAndAddReport(oDocument, basicStats, advancedStats, commonWords) {
// Add new page
var oParagraph = Api.CreateParagraph();
oParagraph.AddPageBreak();
oDocument.AddElement(oDocument.GetElementsCount(), oParagraph);
// Main title - highlighting in capital letters
var oHeading = Api.CreateParagraph();
oHeading.AddText("DOCUMENT ANALYSIS REPORT");
oDocument.AddElement(oDocument.GetElementsCount(), oHeading);
// Subheading - in capital letters
var oSubHeading = Api.CreateParagraph();
oSubHeading.AddText("BASIC STATISTICS");
oDocument.AddElement(oDocument.GetElementsCount(), oSubHeading);
// Add basic statistics
var oStatsPara = Api.CreateParagraph();
oStatsPara.AddText("• Word Count: " + basicStats.wordCount);
oDocument.AddElement(oDocument.GetElementsCount(), oStatsPara);
oStatsPara = Api.CreateParagraph();
oStatsPara.AddText("• Sentence Count: " + basicStats.sentenceCount);
oDocument.AddElement(oDocument.GetElementsCount(), oStatsPara);
oStatsPara = Api.CreateParagraph();
oStatsPara.AddText("• Paragraph Count: " + basicStats.paragraphCount);
oDocument.AddElement(oDocument.GetElementsCount(), oStatsPara);
oStatsPara = Api.CreateParagraph();
oStatsPara.AddText("• Character Count (with spaces): " + basicStats.charCountWithSpaces);
oDocument.AddElement(oDocument.GetElementsCount(), oStatsPara);
oStatsPara = Api.CreateParagraph();
oStatsPara.AddText("• Character Count (without spaces): " + basicStats.charCountWithoutSpaces);
oDocument.AddElement(oDocument.GetElementsCount(), oStatsPara);
oStatsPara = Api.CreateParagraph();
oStatsPara.AddText("• Estimated Line Count: " + basicStats.lineCount);
oDocument.AddElement(oDocument.GetElementsCount(), oStatsPara);
// Advanced analysis title
oSubHeading = Api.CreateParagraph();
oSubHeading.AddText("ADVANCED ANALYSIS");
oDocument.AddElement(oDocument.GetElementsCount(), oSubHeading);
// Add advanced analysis results
oStatsPara = Api.CreateParagraph();
oStatsPara.AddText("• Average Sentence Length: " + advancedStats.avgWordsPerSentence.toFixed(2) + " words");
oDocument.AddElement(oDocument.GetElementsCount(), oStatsPara);
oStatsPara = Api.CreateParagraph();
oStatsPara.AddText("• Average Paragraph Length: " + advancedStats.avgWordsPerParagraph.toFixed(2) + " words");
oDocument.AddElement(oDocument.GetElementsCount(), oStatsPara);
oStatsPara = Api.CreateParagraph();
oStatsPara.AddText("• Average Word Length: " + advancedStats.avgWordLength.toFixed(2) + " characters");
oDocument.AddElement(oDocument.GetElementsCount(), oStatsPara);
oStatsPara = Api.CreateParagraph();
oStatsPara.AddText("• Readability Score: " + advancedStats.readabilityScore.toFixed(2));
oDocument.AddElement(oDocument.GetElementsCount(), oStatsPara);
oStatsPara = Api.CreateParagraph();
oStatsPara.AddText("• Estimated Reading Time: " + advancedStats.readingTimeMinutes + " minutes");
oDocument.AddElement(oDocument.GetElementsCount(), oStatsPara);
// Common words title
oSubHeading = Api.CreateParagraph();
oSubHeading.AddText("MOST FREQUENTLY USED WORDS");
oDocument.AddElement(oDocument.GetElementsCount(), oSubHeading);
// We'll create a simple list instead of a table
if (commonWords.length > 0) {
for (var i = 0; i < commonWords.length; i++) {
var oWordPara = Api.CreateParagraph();
oWordPara.AddText((i + 1) + ". " + commonWords[i].word + " (" + commonWords[i].frequency + " times)");
oDocument.AddElement(oDocument.GetElementsCount(), oWordPara);
}
} else {
var oNoneFoundPara = Api.CreateParagraph();
oNoneFoundPara.AddText("No frequently used words found.");
oDocument.AddElement(oDocument.GetElementsCount(), oNoneFoundPara);
}
// Footer note
var oFootnotePara = Api.CreateParagraph();
oFootnotePara.AddText("This report was generated by OnlyOffice Document Statistics and Analysis Tool on " +
new Date().toLocaleString() + ".");
oDocument.AddElement(oDocument.GetElementsCount(), oFootnotePara);
}
// Run the macro
analyzeDocument();
})();
Pour utiliser cette macro dans ONLYOFFICE
- Ouvrez votre document dans ONLYOFFICE
- Naviguez vers l’onglet Affichage et sélectionnez Macros
- Créez une nouvelle macro et collez le code
- Lancez la macro
- Un rapport d’analyse détaillé sera ajouté à la fin de votre document.
Exécutons maintenant notre macro et voyons comment elle fonctionne !
Cette macro est un outil précieux pour les professionnels qui cherchent à automatiser les processus d’analyse de texte et de documentation dans un environnement de bureau moderne. Nous espérons qu’elle sera un complément utile à votre boîte à outils.
Nous vous encourageons à explorer la documentation de l’API ONLYOFFICE pour créer vos propres macros personnalisées ou améliorer celle-ci. Si vous avez des idées d’amélioration ou des suggestions pour de nouvelles macros, n’hésitez pas à nous contacter. Vos commentaires nous aident à continuer à développer des outils qui rendent la création et l’édition de documents plus efficaces.
À propos de l’auteur
Créez votre compte ONLYOFFICE gratuit
Affichez, modifiez et coéditez des documents texte, feuilles de calcul, diapositives, formulaires et fichiers PDF en ligne.