Releases: lablup/backend.ai
24.03.7b3
Fixes
- Fix
BACKEND_MODEL_NAME
environment always overwritten to model name specified at model definition (#2481) - Do not allow assigning preopen port which collides with image's own service port definition (#2482)
- Fix GET requests with queryparams defined in API spec occasionally throwing 400 Bad Request error (#2483)
Full Changelog
Check out the full changelog until this release (24.03.7b3).
Full Commit Logs
Check out the full commit logs between release (24.03.7b2) and (24.03.7b3).
24.03.7b2
Fixes
- Fix incorrect check of values returned from docker stat API. (#2389)
- Handle all possible exceptions when scheduling single node session so that the status information of pending session is not empty. (#2411)
- Improve error handling of initialization failures in the kernel runner (#2478)
Full Changelog
Check out the full changelog until this release (24.03.7b2).
Full Commit Logs
Check out the full commit logs between release (24.03.7b1) and (24.03.7b2).
24.03.7b1
Features
- Add
row_id
,type
andcontainer_registry
fields to theGroupNode
GQL schema. (#2409) - Add support for PureStorage RapidFiles Toolkit v2 (#2419)
Improvements
- Remove database-level foreign key constraints in
vfolders.{user,group}
columns to decouple the timing of vfolder deletion and user/group deletion. (#2404)
Fixes
- Add missing
commit_session_to_file
toOP_EXC
(#2127) - Pass ImageRef.canonical in
commit_session_to_file
(#2134) - Omit to clean containerless kernels which are still creating its container. (#2317)
- Run batch execution after the batch session starts. (#2327)
- Add support for configuring
sync_container_lifecycles()
task. (#2338) - Restrict GraphQL query to
user_nodes
field to requiresuperadmin
privilege (#2401) - Utilize
ExtendedJSONEncoder
for error logging to handleUUID
objects inextra_data
(#2415) - Change outdated references in event module from
kernels
tosessions
. (#2421) - Upgrade
inquirer
to remove dependency on deprecateddistutils
, which breaks up execution of the scie builds (#2424) - Allow specific status of vfolders to query to purge. (#2429)
- Update the install-dev scripts to use
pnpm
instead ofnpm
to speed up installation and resolve some peculiar version resolution issues related to esbuild. (#2436) - Fix a packaging issue in the
backendai-webserver
scie executable due to missing explicit requirement of setuptools (#2454) - Improve pruning of non-physical filesystems when measuring disk usage in agents (#2460)
External Dependency Updates
- Upgrade aiodocker to 0.22.1 to fix error handling when trying to extract the log of non-existing containers (#2402)
- Upgrade the base CPython from 3.12.2 to 3.12.4 (#2449)
Miscellaneous
- Handle container creation exception and start exception in separate try-except contexts. (#2316)
Full Changelog
Check out the full changelog until this release (24.03.7b1).
Full Commit Logs
Check out the full commit logs between release (24.03.7a2) and (24.03.7b1).
24.03.7a2
Features
- Add support for fetching container logs of a specific kernel. (#2364)
Fixes
- Fix buggy resolver of
model_card
GQL Query. (#2161) - Keep
sync_container_lifecycles()
bgtask alive in a loop. (#2178) - Shutdown agent properly by removing a code that waits a cancelled task. (#2392)
Miscellaneous
- Finally stabilize the hanging tests in our CI due to docker-internal races on TCP port mappings to concurrently spawned fixture containers by introducing monotonically increasing TCP port numbers (#2379)
- Further improve the monotonic port allocation logic for the test containers to remove maximum concurrency restrictions (#2396)
Full Changelog
Check out the full changelog until this release (24.03.7a2).
Full Commit Logs
Check out the full commit logs between release (24.03.7a1) and (24.03.7a2).
24.03.7a1
Features
- Allow superadmins to force-update session status through destroy API. (#2275)
- Introduce Python native WSProxy (#2372)
Fixes
- Fix user creation error when any model-store does not exists. (#2160)
- Ensure that utilization idleness is checked after a set period. (#2205)
- Fix
ZeroDivisionError
in volume usage calculation by returning 0% when volume capacity is zero (#2245) - Fix GraphQL to support query to non-installed images (#2250)
- Add missing
push_image
method implementation to Dummy Agent (#2253) - Corrected an issue where the
resource_policy
field in the user model was incorrectly mapped todomain_name
. (#2314) - Fix mismatches between responses of
/services/_runtimes
and new model service creation input (#2371)
External Dependency Updates
- Upgrade aiodocker to v0.22.0 with minor bug fixes found by improved type annotations (#2339)
Full Changelog
Check out the full changelog until this release (24.03.7a1).
Full Commit Logs
Check out the full commit logs between release (24.03.6) and (24.03.7a1).
24.03.6
Fixes
- Fix model service sessions created before 24.03.5 failing to spawn (#2318)
- Image commit not working (#2319)
- model service session scheduler (
scale_services()
) failing when sessions bound to active route already marked as terminated (#2320) - Fix container metric collection halted on systems with Cgroups v1 (#2321)
Full Changelog
Check out the full changelog until this release (24.03.6).
Full Commit Logs
Check out the full commit logs between release (24.03.5) and (24.03.6).
24.03.5
Features
- New redis client (experimental) (#2041)
- Add support for CentOS 8 based kernels (#2220)
- Allow modifying model service session's environment variable setup (#2255)
- Add
endpoint.runtime_variant
column (#2256) - Add new API to show list of supported inference runtimes (#2258)
- Add support for model service provisioning without
model-definition.yaml
(#2260)
Fixes
- Do not omit to update session's occupying resources to DB when a kernel starts. (#1832)
- Rename no-op
access_key
parameter ofendpoint_list
GQL Query touser_uuid
(#2287) - Fix
ai.backend.service-ports
label syntax broken when image does not expose built-in service port (#2288) - Improve stability of
untag_image_from_registry
mutation (#2289) - SSH not working between kernels started with customized image (#2290)
- Invalid container memory capacity reported (#2291)
Full Changelog
Check out the full changelog until this release (24.03.5).
Full Commit Logs
Check out the full commit logs between release (24.03.5rc1) and (24.03.5).
24.03.5rc1
No significant changes.
Full Changelog
Check out the full changelog until this release (24.03.5rc1).
Full Commit Logs
Check out the full commit logs between release (24.03.5b1) and (24.03.5rc1).
24.03.5b1
Features
- New redis client (experimental) (#2041)
- Add support for CentOS 8 based kernels (#2220)
- Allow modifying model service session's environment variable setup (#2255)
- Add
endpoint.runtime_variant
column (#2256) - Add new API to show list of supported inference runtimes (#2258)
- Add support for model service provisioning without
model-definition.yaml
(#2260)
Fixes
- Do not omit to update session's occupying resources to DB when a kernel starts. (#1832)
- Rename no-op
access_key
parameter ofendpoint_list
GQL Query touser_uuid
(#2287) - Fix
ai.backend.service-ports
label syntax broken when image does not expose built-in service port (#2288) - Improve stability of
untag_image_from_registry
mutation (#2289) - SSH not working between kernels started with customized image (#2290)
- Invalid container memory capacity reported (#2291)
Full Changelog
Check out the full changelog until this release (24.03.5b1).
Full Commit Logs
Check out the full commit logs between release (24.03.4) and (24.03.5b1).
24.03.4
Features
- Allow user to explicitly set filename of model definition YAML (#2063)
- Revamp images GQL query by changing image filtering from flag-based to feature set-based and add
aliases
field to customized image GQL schema (#2136) - Added missing fields for
keypair_resource_policy
in client-py, models, etc. (#2146) - Add parameters to
check-presets
SDK function (#2153) - Add relay-aware
VirtualFolderNode
GQL Query (#2165) - Also perform basic model service validation process when updating model service via
ModifyEndpoint
(#2167) - Add support for mounting arbitrary VFolders on model service session (#2168)
- Clear zombie routes automatically (#2229)
Fixes
- Let the
backend.ai mgr clear-history
command clears session records as well as kernel records (#2077) - Fix orphan model service routes being created (#2096)
- Fix initialization of the resource usage API's kernel-level usage aggregation (#2102)
- Fix model server starting on every kernels (including sub role kernels) on multi container infernce session (#2124)
- Handle fileset-already-exists response of
create-filset
API request and make sure to wait between all GPFS job polling iterations (#2144) -
- Fix error when calling
check_presets
Client SDK API with an invalidgroup
parameter - Rewrite Client SDK to access all APIConfig fields (#2152)
- Fix error when calling
- Ensure that all pending sessions are picked by schedulers (#2155)
- Fix security vulnerability for
sudo_session_enabled
(#2162) - Rename
endpoints.model_mount_destiation
tomodel_mount_destination
(#2163) - Wait for real quota scope directory creation after Netapp
create_qtree()
call (#2170) - Fix wrong per-user concurrency calculation logic (#2175)
- Fix model service persisting on
degraded
status forever in rare chance when trying to delete the service (#2191) - Fix error when query or mutate GraphQL using
BigInt
field type (#2203) - Fix
backend.ai ssh
command execution when packaged as SCIE/PEX (#2226) -
- fix
endpoints
query not working when trying to loadimage_row.aliases
- fix
endpoints.status
reportingPROVISIONING
when its status is inDESTROYING
state (#2233)
- fix
- Fix GQL raising error when trying to resolve
endpoints.errors
field occasionally (#2236)
Miscellaneous
- Fix incorrect version notation of GQL Field. (#1993)
- Add max_pending_session_count field to Keypair resource policy GQL schema (#2013)
Full Changelog
Check out the full changelog until this release (24.03.4).
Full Commit Logs
Check out the full commit logs between release (24.03.4rc1) and (24.03.4).