Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: PDF blocks processing is implemented #7

Open
wants to merge 7 commits into
base: myhailochernyshov/block-types-processing-refactoring
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions MANIFEST.in
Original file line number Diff line number Diff line change
@@ -1,6 +1,7 @@
include LICENSE
include README.rst

recursive-include src/cc2olx/templates *
recursive-include requirements *
recursive-include tests *
recursive-exclude * __pycache__
Expand Down
17 changes: 16 additions & 1 deletion README.rst
Original file line number Diff line number Diff line change
Expand Up @@ -16,6 +16,7 @@ Converted:
- Some videos
- LTI links
- QTI assessments
- PDF files

Not converted:

Expand Down Expand Up @@ -60,10 +61,24 @@ The link map file can be supplied using `-f` or `--link_file`::

If the original course content contains relative links and the resources
(images, documents etc) the links point to are not included into the exported
course dump, you can specify their source using `-s` flag:
course dump, you can specify their source using `-s` flag::

cc2olx -i <IMSCC_FILE> -s <RELATIVE_LINKS_SOURCE>

If some custom xBlocks are installed on the target Open edX instance, the
corresponding blocks can be specified by `-c` argument. If the content that
such xBlocks can render are found during the course converting, they will be
used. The argument values correspond to the xBlock names to specify in
`advanced_modules` inside a course advanced settings.

Supported Custom xBlocks:

- `pdf <https://github.com/raccoongang/xblock-pdf>`_

Argument usage example::

cc2olx -i <IMSCC_FILE> -c <CUSTOM_BLOCK_1_NAME> -c <CUSTOM_BLOCK_2_NAME>

Dockerization
-------------

Expand Down
2 changes: 1 addition & 1 deletion pytest.ini
Original file line number Diff line number Diff line change
@@ -1,3 +1,3 @@
[pytest]
usefixtures = chdir_to_workspace
DJANGO_SETTINGS_MODULE = cc2olx.django_settings
DJANGO_SETTINGS_MODULE = cc2olx.settings
1 change: 1 addition & 0 deletions requirements/base.in
Original file line number Diff line number Diff line change
@@ -1,6 +1,7 @@
# Core requirements for this package

Django
attrs
lxml
requests
youtube-dl
2 changes: 2 additions & 0 deletions requirements/base.txt
Original file line number Diff line number Diff line change
Expand Up @@ -6,6 +6,8 @@
#
asgiref==3.8.1
# via django
attrs==24.3.0
# via -r requirements/base.in
backports-zoneinfo==0.2.1
# via django
certifi==2024.12.14
Expand Down
4 changes: 4 additions & 0 deletions requirements/ci.txt
Original file line number Diff line number Diff line change
Expand Up @@ -8,6 +8,10 @@ asgiref==3.8.1
# via
# -r /home/misha/work/cc2olx/requirements/quality.txt
# django
attrs==24.3.0
# via
# -c /home/misha/work/cc2olx/requirements/constraints.txt
# -r /home/misha/work/cc2olx/requirements/quality.txt
backports-zoneinfo==0.2.1
# via
# -r /home/misha/work/cc2olx/requirements/quality.txt
Expand Down
2 changes: 2 additions & 0 deletions requirements/constraints.txt
Original file line number Diff line number Diff line change
Expand Up @@ -7,3 +7,5 @@
# link to other information that will help people in the future to remove the
# pin when possible. Writing an issue against the offending project and
# linking to it here is good.

attrs==24.3.0
5 changes: 5 additions & 0 deletions requirements/dev.txt
Original file line number Diff line number Diff line change
Expand Up @@ -9,6 +9,11 @@ asgiref==3.8.1
# -r /home/misha/work/cc2olx/requirements/ci.txt
# -r /home/misha/work/cc2olx/requirements/quality.txt
# django
attrs==24.3.0
# via
# -c /home/misha/work/cc2olx/requirements/constraints.txt
# -r /home/misha/work/cc2olx/requirements/ci.txt
# -r /home/misha/work/cc2olx/requirements/quality.txt
backports-tarfile==1.2.0
# via jaraco-context
backports-zoneinfo==0.2.1
Expand Down
4 changes: 4 additions & 0 deletions requirements/quality.txt
Original file line number Diff line number Diff line change
Expand Up @@ -8,6 +8,10 @@ asgiref==3.8.1
# via
# -r /home/misha/work/cc2olx/requirements/test.txt
# django
attrs==24.3.0
# via
# -c /home/misha/work/cc2olx/requirements/constraints.txt
# -r /home/misha/work/cc2olx/requirements/test.txt
backports-zoneinfo==0.2.1
# via
# -r /home/misha/work/cc2olx/requirements/test.txt
Expand Down
4 changes: 4 additions & 0 deletions requirements/test.txt
Original file line number Diff line number Diff line change
Expand Up @@ -8,6 +8,10 @@ asgiref==3.8.1
# via
# -r /home/misha/work/cc2olx/requirements/base.txt
# django
attrs==24.3.0
# via
# -c /home/misha/work/cc2olx/requirements/constraints.txt
# -r /home/misha/work/cc2olx/requirements/base.txt
backports-zoneinfo==0.2.1
# via
# -r /home/misha/work/cc2olx/requirements/base.txt
Expand Down
2 changes: 1 addition & 1 deletion setup.py
Original file line number Diff line number Diff line change
Expand Up @@ -25,7 +25,7 @@
"Programming Language :: Python :: 3.8",
"Topic :: Utilities",
],
description=("Command line tool, that converts Common Cartridge " "courses to Open edX Studio imports."),
description="Command line tool, that converts Common Cartridge courses to Open edX Studio imports.",
entry_points={"console_scripts": ["cc2olx=cc2olx.main:main"]},
install_requires=load_requirements("requirements/base.in"),
license="GNU Affero General Public License",
Expand Down
35 changes: 35 additions & 0 deletions src/cc2olx/cli.py
Original file line number Diff line number Diff line change
@@ -1,12 +1,39 @@
import argparse
import logging

from pathlib import Path

from cc2olx.enums import SupportedCustomBlockContentType
from cc2olx.validators.cli import link_source_validator

RESULT_TYPE_FOLDER = "folder"
RESULT_TYPE_ZIP = "zip"

logger = logging.getLogger()


class AppendIfAllowedAction(argparse._AppendAction):
"""
Store a list and append only allowed argument values to the list.
"""

NOT_ALLOWED_CHOICE_MESSAGE = (
"The choice {choice_name!r} is not allowed for {argument_name} argument. It will be ignored during processing."
)

def __init__(self, *args, **kwargs):
super().__init__(*args, **kwargs)

self._choices = self.choices
self.choices = None

def __call__(self, parser, namespace, values, option_string=None):
if values in self._choices:
super().__call__(parser, namespace, values, option_string)
else:
argument_name = "/".join(self.option_strings)
logger.warning(self.NOT_ALLOWED_CHOICE_MESSAGE.format(choice_name=values, argument_name=argument_name))


def parse_args(args=None):
parser = argparse.ArgumentParser(
Expand Down Expand Up @@ -78,4 +105,12 @@ def parse_args(args=None):
type=link_source_validator,
help="The relative links source in the format '<scheme>://<netloc>', e.g. 'https://example.com'",
)
parser.add_argument(
"-c",
"--content_types_with_custom_blocks",
action=AppendIfAllowedAction,
default=[],
choices=list(SupportedCustomBlockContentType.__members__.values()),
help="Names of content types for which custom xblocks will be used.",
)
return parser.parse_args(args)
10 changes: 8 additions & 2 deletions src/cc2olx/constants.py
Original file line number Diff line number Diff line change
@@ -1,3 +1,9 @@
CDATA_PATTERN = r"<!\[CDATA\[(?P<content>.*?)\]\]>"
OLX_STATIC_DIR = "static"
OLX_STATIC_PATH_TEMPLATE = f"/{OLX_STATIC_DIR}/{{static_filename}}"
OLX_STATIC_PATH_TEMPLATE = f"/{OLX_STATIC_DIR}/{{static_file_path}}"
WEB_RESOURCES_DIR_NAME = "web_resources"

LINK_HTML = '<a href="{url}">{text}</a>'
YOUTUBE_LINK_PATTERN = r"youtube.com/watch\?v=(?P<video_id>[-\w]+)"
CDATA_PATTERN = r"<!\[CDATA\[(?P<content>.*?)\]\]>"

QTI_RESPROCESSING_TYPES = ["general_fb", "correct_fb", "general_incorrect_fb"]
18 changes: 18 additions & 0 deletions src/cc2olx/content_parsers/__init__.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,18 @@
from cc2olx.content_parsers.abc import AbstractContentParser, AbstractContentTypeWithCustomBlockParser
from cc2olx.content_parsers.discussion import DiscussionContentParser
from cc2olx.content_parsers.html import HtmlContentParser
from cc2olx.content_parsers.lti import LtiContentParser
from cc2olx.content_parsers.pdf import PdfContentParser
from cc2olx.content_parsers.qti import QtiContentParser
from cc2olx.content_parsers.video import VideoContentParser

__all__ = [
"AbstractContentParser",
"AbstractContentTypeWithCustomBlockParser",
"DiscussionContentParser",
"HtmlContentParser",
"LtiContentParser",
"PdfContentParser",
"QtiContentParser",
"VideoContentParser",
]
52 changes: 52 additions & 0 deletions src/cc2olx/content_parsers/abc.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,52 @@
from abc import ABC, abstractmethod
from typing import Optional, Union

from cc2olx.content_parsers.utils import StaticLinkProcessor
from cc2olx.dataclasses import ContentParserContext
from cc2olx.enums import SupportedCustomBlockContentType
from cc2olx.models import Cartridge


class AbstractContentParser(ABC):
"""
Abstract base class for parsing Common Cartridge content.
"""

def __init__(self, cartridge: Cartridge, context: ContentParserContext) -> None:
self._cartridge = cartridge
self._context = context

def parse(self, idref: Optional[str]) -> Optional[Union[list, dict]]:
"""
Parse the resource with the specified identifier.
"""
if content := self._parse_content(idref):
link_processor = StaticLinkProcessor(self._cartridge, self._context.relative_links_source)
content = link_processor.process_content_static_links(content)
return content

@abstractmethod
def _parse_content(self, idref: Optional[str]) -> Optional[Union[list, dict]]:
"""
Parse content of the resource with the specified identifier.
"""


class AbstractContentTypeWithCustomBlockParser(AbstractContentParser, ABC):
"""
Abstract base class for content type with custom block parsing.
"""

CUSTOM_BLOCK_CONTENT_TYPE: SupportedCustomBlockContentType

def _parse_content(self, idref: Optional[str]) -> Optional[Union[list, dict]]:
if idref and self._context.is_content_type_with_custom_block_used(self.CUSTOM_BLOCK_CONTENT_TYPE):
if resource := self._cartridge.define_resource(idref):
return self._parse_resource_content(resource)
return None

@abstractmethod
def _parse_resource_content(self, resource: dict) -> Optional[Union[list, dict]]:
"""
Parse resource content.
"""
44 changes: 44 additions & 0 deletions src/cc2olx/content_parsers/discussion.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,44 @@
import re
from typing import Dict, Optional

from cc2olx import filesystem
from cc2olx.content_parsers import AbstractContentParser
from cc2olx.enums import CommonCartridgeResourceType
from cc2olx.models import ResourceFile


class DiscussionContentParser(AbstractContentParser):
"""
Discussion resource content parser.
"""

def _parse_content(self, idref: Optional[str]) -> Optional[Dict[str, str]]:
if idref:
if resource := self._cartridge.define_resource(idref):
if re.match(CommonCartridgeResourceType.DISCUSSION_TOPIC, resource["type"]):
return self._parse_discussion(resource)
return None

def _parse_discussion(self, resource: dict) -> Dict[str, str]:
"""
Parse the discussion content.
"""
data = {}

for child in resource["children"]:
if isinstance(child, ResourceFile):
data.update(self._parse_resource_file_data(child, resource["type"]))

return data

def _parse_resource_file_data(self, resource_file: ResourceFile, resource_type: str) -> Dict[str, str]:
"""
Parse the discussion resource file.
"""
tree = filesystem.get_xml_tree(self._cartridge.build_resource_file_path(resource_file.href))
root = tree.getroot()

return {
"title": root.get_title(resource_type).text,
"text": root.get_text(resource_type).text,
}
Loading
Loading