Register Guidelines E-Books Today's Posts Search

Go Back   MobileRead Forums > E-Book Software > Calibre > Editor

Notices

Reply
 
Thread Tools Search this Thread
Old 10-21-2020, 03:40 AM   #1
EbookMakers
Enthusiast
EbookMakers began at the beginning.
 
Posts: 26
Karma: 38
Join Date: Nov 2019
Location: Paris, France
Device: none
Regex-function to merge endnotes files in editor

Do you have old epubs with one xhtml page per endnote? It is from version 3 of calibre that Kovid proposed a checkbox (in the docx configuration) preventing this separation between the endnotes during a docx -> epub conversion by calibre. An epub -> epub conversion can't change it.

The interface of calibre makes it easy to manually group the notes into a single page, the longest being to determine which files are affected...

I try to do this with a regex-function to run in automatic mode. It runs without an error message.

I have two issues:

1) The editor interface is not updated at the end of the regex-function. If I save a copy of the epub, and examine that file, it shows that the merge was successful. Without really knowing whether to look this side, I tried using apply_container_update_to_gui, but I was unsuccessful. How to update the interface?

2) The function is executed in the "spine order". But how to indicate that we want to start with the 1st file of the book regardless of the current file, in order to group the notes in the 1st note file.

A test file is joined.

Spoiler:

The principle of the regex-function is as follows:

The regex selects something from files with a single note and only those. They must respect the html syntax resulting from a conversion by calibre. Counting by the regex shows whether it is useful to run the regex-function or not.

The important thing is not what is selected but to retrieve the name of the file to add it to a list of files to be merged. This list of filenames grows with more occurrences of the regex.

The function chooses the 1st file as the master file. Merge will only run if there are at least 2 files to merge.
Attached Files
File Type: epub Test endnotes.epub (146.7 KB, 229 views)

Last edited by EbookMakers; 11-23-2020 at 09:30 PM.
EbookMakers is offline   Reply With Quote
Old 10-21-2020, 05:27 AM   #2
kovidgoyal
creator of calibre
kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.
 
kovidgoyal's Avatar
 
Posts: 43,935
Karma: 22669822
Join Date: Oct 2006
Location: Mumbai, India
Device: Various
Just ctrl-click the files in the files browser in the editor, then right lick and choose merge.
kovidgoyal is offline   Reply With Quote
Advert
Old 10-21-2020, 07:24 AM   #3
EbookMakers
Enthusiast
EbookMakers began at the beginning.
 
Posts: 26
Karma: 38
Join Date: Nov 2019
Location: Paris, France
Device: none
Thank you for your answer. I Know, this is the reason why I started writing : "The interface of calibre makes it easy to manually group the notes into a single page, the longest being to determine which files are affected...". And the regex without the function can help me find which files are affected.
Will it make the function do what I hope?

Last edited by EbookMakers; 10-21-2020 at 07:29 AM.
EbookMakers is offline   Reply With Quote
Old 10-21-2020, 08:08 AM   #4
kovidgoyal
creator of calibre
kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.
 
kovidgoyal's Avatar
 
Posts: 43,935
Karma: 22669822
Join Date: Oct 2006
Location: Mumbai, India
Device: Various
To refresh the ui use the boss object

from calibre.gui2.tweak_book.boss import get_boss
get_boss().apply_container_update_to_gui()
kovidgoyal is offline   Reply With Quote
Old 10-21-2020, 09:11 AM   #5
EbookMakers
Enthusiast
EbookMakers began at the beginning.
 
Posts: 26
Karma: 38
Join Date: Nov 2019
Location: Paris, France
Device: none
Thanks a lot, Kovid. I'll try.

Last edited by EbookMakers; 10-21-2020 at 10:15 AM.
EbookMakers is offline   Reply With Quote
Advert
Old 10-21-2020, 10:15 AM   #6
EbookMakers
Enthusiast
EbookMakers began at the beginning.
 
Posts: 26
Karma: 38
Join Date: Nov 2019
Location: Paris, France
Device: none
By modifying the function as you indicated, in the editor, the files which are not the "merge master" are deleted. But the "merge master" file still contains only one note.

However, the function is executed correctly: on a "commit as" we find all the desired modifications in the saved file.

Is there also a solution to force the search to start with the 1st file and not with the current file ?
Attached Files
File Type: py merge-note5.py (1.6 KB, 187 views)

Last edited by EbookMakers; 10-21-2020 at 10:19 AM.
EbookMakers is offline   Reply With Quote
Old 11-23-2020, 09:05 PM   #7
EbookMakers
Enthusiast
EbookMakers began at the beginning.
 
Posts: 26
Karma: 38
Join Date: Nov 2019
Location: Paris, France
Device: none
A test epub is attached to the lead post of this topic. We can think of two solutions. A solution for well-behaved people like you and even me, and a solution for rascals. They use the same regex:

Code:
<body[^\n]*\n\K\s*(<h[^>]*>[^<]*</h\d>)?\s*<dl[^>]*>\s*<dt[^>]*>\[<a\b(?:(?!</dl).)+</dl>\s*(?=</body>)
The \K switch resets the selection. The expression placed before the switch is equivalent to a positive backward assertion. I use it, for my own reasons, to maintain compatibility with the PCRE engine which does not accept variable length back assertions as it does here.

On an epub respecting the html syntax resulting from a docx -> epub conversion, the regex selects:

- the note in files containing one note and only one according to the syntax of the conversion, ensuring that the note is surrounded by the pair of body tags.
- in optional group 1, the title preceding the 1st note only (after the conversion).

The regex successively selects the solitary notes which respect the syntax of the conversion. It therefore also allows you to know the name of the xhtml files which contain them. Asking the regex for counting would tell if the epub is affected by the purpose of the regex-function. Merging of notes should only be requested if there are at least two notes. If group 1 exists, the file contains the 1st note.

We cannot predict on which (active) file the regex will start. We can ask that it browse the files in the “spine” order with the parameter:
replace.file_order = 'spine'

We only know that the occurrence for which group 1 exists is the 1st note. Both solutions rely on this characteristic to obtain a file with the notes starting with the 1st note and then in the correct order. Otherwise, as stated in a previous message, the order of the notes in the result file would depend on the active file when launching the regex.

One argument to the replace function is “data”, which is a persistent ׅ “dic” during the execution of the function. Our two functions store their information in this dic.

It is possible to request that the function be executed a last time after the last occurrence:
replace.call_after_last_match = True

It is in this last time that the merge will be requested. Merge updates notes calls in the text and the opf file (since it deletes files). The display must then be updated in the editor as written above by Kovid:
get_boss (). apply_container_update_to_gui ()

A major problem is that the result of the regex-function comes from the “return” of the “replace” function, even though the merge is executed after processing the last occurrence! One would have expected that the result of the regex-function would come from the "merge". The main difference between the two solutions is how to work around this problem.

Both functions are commented.
EbookMakers is offline   Reply With Quote
Old 11-23-2020, 09:06 PM   #8
EbookMakers
Enthusiast
EbookMakers began at the beginning.
 
Posts: 26
Karma: 38
Join Date: Nov 2019
Location: Paris, France
Device: none
The function for rascals

The function builds two lists of filenames that it feeds depending on whether it has already encountered the file containing the title or not, which depends on the active file when the regex is launched. The file containing the title is the one containing the 1st note, the one on which the merge will be done.

The two lists of file names are merged to get the complete list of files to be merged.

We raise an exception with the 'raise' instruction to stop the function after the 'merge' and before the 'return' which would cancel the result. This is the dirty side of the job.

At runtime, a warning message appears, which says: Merging: Files merged out.

Code:
from calibre.gui2.tweak_book import current_container
from calibre.gui2.tweak_book.boss import get_boss
from calibre.ebooks.oeb.polish.split import merge

class Merging(LookupError):
    pass
    # Warning : very dirty work around
    # Custom class exception, to provoke the end of job without return
    # If we use return, we loose the result of the merging

    
def replace(match, number, file_name, metadata, dictionaries, data, functions, *args, **kwargs):
    
    if match is None:	# this is the last passage (all matches found)
        ctnr = current_container()
        # Merge files whose name is in the list 'note_files_list'
        # (stored in persistent dic 'data', a parameter of replace()) 
        # into the file whose name is in 'merge_master' (also stored in data)
        data['note_files_list'] = data['first_notes'] + data['last_notes']
        if data and len(data['note_files_list']) > 1:
            merge(ctnr, 'text', data['note_files_list'], data['merge_master'])
            get_boss().apply_container_update_to_gui()

        # very dirty trick : get out without applying 'return' :
        raise Merging("Files merged out")  
                        
    else:

        if 'merge_master' not in data :  
            # data is empty, therefore it's the 1st iteration 
			# the list of files and the master of merge are initialized
            data['note_files_list'] = []
            data['merge_master'] = []
            
            if match.group(1):
                # If group 1 exists, the note contains the title and therefore the note is the 1st note
                # The master of merge becomes the current note file
                data['first'] = True
                data['merge_master'] = file_name
                data['first_notes'] = [file_name]
                data['last_notes'] = []
            else:
                data['first'] = False
                data['first_notes'] = []
                data['last_notes'] = [file_name]

            # Ask for a passage after the last find (match will be None)
            # Ask for processing the files in the order they appear in the book
            replace.call_after_last_match = True
            replace.file_order = 'spine'
           
        else:
            if match.group(1):
                data['first'] = True
                # The master of merge becomes the current note file
                data['merge_master'] = file_name
            # Increments the list of files by adding the name of the current file
            if data['first']:
                data['first_notes'].append(file_name)
            else:
                data['last_notes'].append(file_name)
           
        return match.group()

Last edited by EbookMakers; 11-24-2020 at 09:37 AM.
EbookMakers is offline   Reply With Quote
Old 11-23-2020, 09:07 PM   #9
EbookMakers
Enthusiast
EbookMakers began at the beginning.
 
Posts: 26
Karma: 38
Join Date: Nov 2019
Location: Paris, France
Device: none
The function for well-behaved people

This function is not interrupted, the ‘return’ is executed. But this 'return' replaces the selected note by the set of notes already encountered, and ordered by a mechanism similar to that of the rascal function. Therefore, after 'merging' and 'returning', we get a single file containing all of the ordered notes.

Code:
# <body[^\n]*\n\K\s*(<h[^>]*>[^<]*</h\d>)?\s*<dl[^>]*>\s*<dt[^>]*>\[<a\b(?:(?!</dl).)+</dl>\s*(?=</body>)

from calibre.gui2.tweak_book import current_container
from calibre.gui2.tweak_book.boss import get_boss
from calibre.ebooks.oeb.polish.split import merge

def replace(match, number, file_name, metadata, dictionaries, data, functions, *args, **kwargs):

    if match is None:	# this is the last passage (all matches found)
        ctnr = current_container()
        # Merge at least 2 files whose name is in the list 'note_files_list'
        # (stored in persistent dict 'data', a parameter of replace()) 
        # into the file whose name is in 'merge_master' (also stored in data)
        if data and len(data['note_files_list']) > 1:
           merge(ctnr, 'text', data['note_files_list'], data['merge_master'])
           get_boss().apply_container_update_to_gui()

    else:

        if 'merge_master' not in data :  
            # data is empty, therefore it's the 1st iteration
            # the list of files is initialized with the current note file
            # The master of merge is the current note file
            data['note_files_list'] = [file_name]
            data['merge_master'] = file_name

            if match.group(1):
                # If group 1 exists, the note contains the title and therefore the note is the 1st note
                data['first'] = True
                data['first_notes'] = match.group()
                data['last_notes'] = ''
            else:
                data['first'] = False
                data['first_notes'] = ''
                data['last_notes'] = match.group()

            # Ask for a passage after the last find (match will be None)
            # Ask for processing the files in the order they appear in the book
            replace.call_after_last_match = True
            replace.file_order = 'spine'
           
        else:
            # Increments the list of files by adding the name of the current file
            # The master of merge becomes the current note file
            data['note_files_list'].append(file_name)
            data['merge_master'] = file_name
            if match.group(1):
                data['first'] = True
            if data['first']:
                # If first is true, the function has already processed the 1st note,
                # we concatenate in first_notes
                data['first_notes'] = data['first_notes'] + match.group()
            else:
                # Otherwise in last_notes
                data['last_notes'] = data['last_notes'] + match.group()

        data['all_notes'] = data['first_notes'] + data['last_notes']
        # print (['note_files_list'], data['merge_master'])
        return data['all_notes']

Last edited by EbookMakers; 11-23-2020 at 09:24 PM.
EbookMakers is offline   Reply With Quote
Old 11-24-2020, 09:39 AM   #10
EbookMakers
Enthusiast
EbookMakers began at the beginning.
 
Posts: 26
Karma: 38
Join Date: Nov 2019
Location: Paris, France
Device: none
I modified the function of the #8 message to add the 'pass' statement in the class 'Merging', instead of relying on the comment to do nothing.

Last edited by EbookMakers; 11-24-2020 at 01:27 PM.
EbookMakers is offline   Reply With Quote
Old 11-26-2020, 11:09 PM   #11
roger64
Wizard
roger64 ought to be getting tired of karma fortunes by now.roger64 ought to be getting tired of karma fortunes by now.roger64 ought to be getting tired of karma fortunes by now.roger64 ought to be getting tired of karma fortunes by now.roger64 ought to be getting tired of karma fortunes by now.roger64 ought to be getting tired of karma fortunes by now.roger64 ought to be getting tired of karma fortunes by now.roger64 ought to be getting tired of karma fortunes by now.roger64 ought to be getting tired of karma fortunes by now.roger64 ought to be getting tired of karma fortunes by now.roger64 ought to be getting tired of karma fortunes by now.
 
Posts: 2,608
Karma: 3000161
Join Date: Jan 2009
Device: Kindle PW3 (wifi)
Hi

Maybe I misunderstood something, but I can't see why it would be necessary to use a regex-function for this.

1. The calibre editor, as Kovid wrote, can merge all the notes placed in their own pages.
2. On the CSS side, I fail to see the usefulness of ordered list code for the footnotes. It's a separate issue of course.

Here is the end result after these two changes.
Attached Files
File Type: epub Test endnotes-V2.epub (145.6 KB, 153 views)
roger64 is offline   Reply With Quote
Old 11-27-2020, 09:35 AM   #12
EbookMakers
Enthusiast
EbookMakers began at the beginning.
 
Posts: 26
Karma: 38
Join Date: Nov 2019
Location: Paris, France
Device: none
The regex alone is interesting since its count allows to know immediately if the epub is concerned or not. Unless we created them ourselves, we don't necessarily know a lot about our epubs.

It is correct that it is not necessary to use a regex-function to merge note files, this was also written from the #1 message.

This is just a small, unpretentious exercise that first shows a use of 'merge' and 'apply_container_update_to_gui'.

It uses 'replace.call_after_last_match = True' and shows that the content of the 'return' triumphs over changes in the text by this last call when one would expect the opposite. It gives 2 ways to overcome this constraint.

It also shows some data manipulation in the persistent dic 'data'.
EbookMakers is offline   Reply With Quote
Reply

Tags
regex-function


Forum Jump

Similar Threads
Thread Thread Starter Forum Replies Last Post
Help creating possible Regex-Function MerlinMama Editor 14 03-03-2020 05:53 AM
Predefined regex for Regex-function sherman Editor 3 01-19-2020 05:32 AM
Merge Books function behaviour change toomuchreading Library Management 4 04-11-2018 02:20 PM
Regex Function about «» and “” senhal Editor 8 04-06-2016 02:12 AM
Is there a way to merge tags, preferably via regex? Awfki Calibre 7 10-31-2015 03:55 PM


All times are GMT -4. The time now is 07:02 PM.


MobileRead.com is a privately owned, operated and funded community.