Blog

VitePress

🔥 Edit on Vscode ⚡️ Edit on StackBlitz

镜像功能

目前已使用Gitee所支持设置的两种镜像：

Push：用于将 Gitee 的仓库自动镜像到 GitHub 。
Pull：用于将 GitHub 的仓库镜像到 Gitee 。

镜像同步的代码是不计入被同步仓库的贡献度

当Gitee不再免费使用镜像同步功能将采取Github Actions实现代码推送Github自动同步到Gitee镜像仓库！/script/sync-gitee.yml

tip: Gitee Pages服务部署的路径是全小写的，而Github Pages生成地址与仓库名称相关区分大小写

持续集成

GitHub 使用的是GitHub Actions持续集成服务

Gitee: Gitee Go 是 Gitee 全新推出的一款 CI/CD 工具我采取本地走脚本的方式

Gitee Go 为增值服务，计费方式为预付费，按构建时长购买。付费企业套餐资费不包含 Gitee Go 等增值服务 😰

推送master分支后自动部署

name: Deploy
on:
  push:
    branches:
      - master

Algolia 爬虫于每周五3点开始

name: Algolia
on:
  schedule:
    - cron:  '0 3 * * 5'

Algolia 免费版存在限制不能每次推送都使用否则

Github Action Error: Crawling issue: nbHits 0 for XXX

原因：You have exceeded your Free app’s 10,000 Record limit. You can delete records or indices, or upgrade at any time for increased capacity.

tip: Schedule 在 GitHub 操作工作流运行的高负载期间，事件可能会延迟。高负载时间包括每小时开始。为了减少延迟的可能性，请安排您的工作流在不同的时间运行。

从其他用户反馈延迟的时间为几十分钟，或者超过一个小时，甚至在某种极端情况下，将不会执行。

所以 Schedule 设置的 cron 时刻，仅仅是工作流进行计划排队的时刻，而不是准确的运行时刻。而且上述时间均为UTC标准时间，不是北京时间。

如果需要换算成北京时间，要在该cron的基础上增加八小时得到北京时间，例如 0 0 * * * 表示在每天 1:00 AM 触发实际是在北京时间的 9:00 AM 才开始。

Edit on StackBlitz ⚡️

StackBlitz 直接操作 GitHub 触发仓库镜像功能再将操作同步至Gitee

The “Open in StackBlitz” button

One of the ways to make your code example stand out in your docs or your repository’s readme file is to use our CTA (call-to-action) buttons.

项目在`StackBlitz`上运行

要允许所有 StackBlitz 项目使用第三方 Cookie，请转到浏览器的 Cookie 首选项，并为以下 URL 模式添加例外：

https://[*.]stackblitz.io
https://[*.]local.webcontainer.io
https://[*.]local-credentialless.webcontainer.io
https://[*.]local-corp.webcontainer.io

使用 `codespaces` 在浏览器上编译运行

"dev:codespace": "npm run dev -- --host 0.0.0.0"

Dependabot 版本更新自动更新依赖项

Dependabot version updates 可免费用于 GitHub.com 上的所有存储库。

version: 2
updates:
  - package-ecosystem: "npm" # See documentation for possible values
    directory: "/" # Location of package manifests
    schedule:
      interval: "monthly"
    commit-message:
      # Prefix all commit messages with "npm"
      prefix: "npm level up"

CSV to QLAB

To run on mac:

download csv_to_qlab.dmg from the latest release
unzip the foder
open the app
- qlab must be open on the recieving computer in order for the messages to be recieved.

Please note that I do not currently have an Apple Developer Certificate and therefore there will be some scary warnings when trying to run this application locally. It is entirely up to you to decide to run this application. If you have concerns with the bundled application releases, I suggest cloning or forking the repository.

How to format your csv file:

Some columns are required, some are optional.

Required columns

Number
Type
Name

Number	Type	Name
12	start	Cue 12 GO

Optional Columns

Notes
Follow

0 – No Follow
1 – Auto-Continue
2 – Auto-Follow

Color (Options)
Target
File Target
Columns available for “midi” cue type:

MIDI Q Number
MIDI Device ID
MIDI Message Type

1 – MIDI Voice Message (“Musical MIDI”)
2 – MIDI Show Control Message (MSC)
3 – MIDI SysEx Message

MIDI Control Number
MIDI Control Value
MIDI Patch Channel
MIDI Patch Number
MIDI Q List
MIDI Command Format (Options)
MIDI Command (Options)

Columns available for “network” cue type:

QLab 5

Network Patch Number
Network Patch Channel
Custom String

QLab 4

Message Type (Options)
OSC Cue Number (Only if using QLab Message Type)
Command
- For QLab Messages (Options)
- For an OSC message, you may now include a raw string in this column

Examples

To run in development:

clone or fork repository
create virtual environment
Install dependencies:

python3 -m pip install --upgrade pip
python3 -m pip install -r requirements.txt

Run:

python3 application.py

The application was bundled for distribution using pyinstaller. To re-bundle, install pyinstaller:

python3 -m pip install pyinstaller

Then run:

pyinstaller application.spec

If you want to run some tests:

Install Pytest

pip install pytest

Run Pytest

pytest

Recomendations for future features are very welcome!

swarmstack/influxdb

Docker compose file for InfluxDB OSS version, also useful for Prometheus long-term storage.

See https://github.com/swarmstack/victoria-metrics for highly performant time-series database that can be used for Prometheus metrics long-term storage, uses signficantly less RAM and supports higher-cardinality time-series data than the OSS version of InfluxDB.

DEPLOY INFLUXDB AS A STACK

INFLUXDB_ADMIN_USER='admin' \
INFLUXDB_ADMIN_PASSWORD='admin' \
INFLUXDB_USER='prometheus' \
INFLUXDB_USER_PASSWORD='prompass' \
docker stack deploy -c docker-compose.yml influxdb

Or you can take some or all of the defaults above:

docker stack deploy -c docker-compose.yml influxdb

swarmstack users should use docker-compose-swarmstack.yml instead.

PROMETHEUS REMOTE READ/WRITE DATABASE (Optional)

Add remote-write and remote-read stanzas to your Prometheus configuration in order to use InfluxDB to store Prometheus metrics longer-term. swarmstack users can add the below to localswarmstack/prometheus/conf/prometheus.yml, otherwise just substitute http://influxdb with your Influx address:

alerting:
  alertmanagers:
  - static_configs:
    - targets: [ 'alertmanager:9093', 'alertmanagerB:9093' ]

remote_write:
  - url: "http://influxdb:8086/api/v1/prom/write?db=prometheus&u=prometheus&p=prompass"

remote_read:
  - url: "http://influxdb:8086/api/v1/prom/read?db=prometheus&u=prometheus&p=prompass"

GRAFANA DASHBOARD

A dashboard which nicely visualizes all ‘internal’ Grafana OSS metrics documented at InfluxData.com and exported via influxdb_stats_exporter into Prometheus.

LLM-Table-Survey

📄 Paper List

Large Language Model

GPT-3, Language Models are Few-Shot Learners. NeurIPS 20. [Paper]
T5, Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer. [Paper]
FLAN, Finetuned Language Models Are Zero-Shot Learners. ICLR 22. [Paper] [Code]
DPO, Direct Preference Optimization: Your Language Model is Secretly a Reward Model. NeurIPS 23. [Paper]
PEFT, The Power of Scale for Parameter-Efficient Prompt Tuning. EMNLP 21. [Paper]
LoRA, LoRA: Low-rank Adaptation of Large Language Models. ICLR 22. [Paper]
Chain-of-thought Prompting, Chain-of-thought prompting elicits reasoning in large language models. NeurIPS 22. [Paper]
Least-to-most Prompting, Least-to-most prompting enables complex reasoning in large language models. ICLR 23. [Paper]
Self-consistency Prompting, Self-consistency improves chain of thought reasoning in language models. ICLR 23. [Paper]
ReAct, ReAct: Synergizing Reasoning and Acting in Language Models. ICLR 23. [Paper] [Code]

Pre-LLM Era Table Training

TaBERT, TaBERT: Pretraining for Joint Understanding of Textual and Tabular Data. ACL 20 Main. [Paper] [Code]
TaPEx, TAPEX: Table Pre-training via Learning a Neural SQL Executor. ICLR 22. [Paper] [Code] [Models]
TABBIE, TABBIE: Pretrained Representations of Tabular Data. NAACL 21 Main. [Paper] [Code]
TURL, TURL: Table Understanding through Representation Learning. VLDB 21. [Paper] [Code]
RESDSQL, RESDSQL: Decoupling Schema Linking and Skeleton Parsing for Text-to-SQL. AAAI 23. [Paper] [Code]
UnifiedSKG, UnifiedSKG: Unifying and Multi-Tasking Structured Knowledge Grounding with Text-to-Text Language Models. EMNLP 22 Main. [Paper ] [Code]
SpreadsheetCoder, SpreadsheetCoder: Formula Prediction from Semi-structured Context. ICML 21. [Paper] [Code]

Table Instruction-Tuning

Table-GPT, Table-GPT: Table-tuned GPT for Diverse Table Tasks. arXiv 2023. [Paper]
TableLlama, TableLlama: Towards Open Large Generalist Models for Tables. NAACL 24. [Paper] [Code] [Model: TableLlama 7B] [Dataset: TableInstruct]

Code LLM

Codex, Evaluating Large Language Models Trained on Code. arXiv 21. [Paper]
StarCoder, StarCoder: may the source be with you!. TMLR 23. [Paper] [Code] [Models]
Code Llama, Code Llama: Open Foundation Models for Code. arXiv 23. [Paper] [Code]
WizardLM, WizardLM: Empowering Large Pre-Trained Language Models to Follow Complex Instructions. ICLR 24. [Paper] [Model: WizardLM 13B] [Model: WizardLM 70B]
WizardCoder, WizardCoder: Empowering Code Large Language Models with Evol-Instruct. ICLR 24. [Paper] [Code] [Models: WizardCoder 15B]
Magicoder, Magicoder: Source Code Is All You Need. ICML 24. [Paper] [Code] [Models 6.7B/7B]
Lemur, Lemur: Harmonizing Natural Language and Code for Language Agents. ICLR 24. [Paper] [Code] [Model: Lemur 70B] [Model: Lemur 70B Chat]
InfiAgent-DABench, InfiAgent-DABench: Evaluating Agents on Data Analysis Tasks. ICML 24. [Paper] [Code]

Hybrid of Table & Code

TableLLM, TableLLM: Enabling Tabular Data Manipulation by LLMs in Real Office Usage Scenarios. [Paper] [Model TableLLM 7B] [Model TableLLM 13B]
StructLM, StructLM: Towards Building Generalist Models for Structured Knowledge Grounding. arXiv 24. [Paper] [Model: StructLM 7B] [Model: StructLM 13B] [Model: StructLM 34B] [Dataset: SKGInstruct]

Parameter-Efficient Fine-Tuning

FinSQL, FinSQL: Model-Agnostic LLMs-based Text-to-SQL Framework for Financial Analysis. SIGMOD Companion 24. [[Paper](https://arxiv.org/pdf/2401.10506)]

Direct Preference Optimization

SENSE, Synthesizing Text-to-SQL Data from Weak and Strong LLMs. ACL 24. [Paper]

Small Language Model + Large Language Model

ZeroNL2SQL, Combining Small Language Models and Large Language Models for Zero-Shot NL2SQL. VLDB 24. [Paper]

Multimodal Table Understanding & Extraction

LayoutLM, LayoutLM: Pre-training of Text and Layout for Document Image Understanding. KDD 20. [Paper]
PubTabNet, Image-Based Table Recognition: Data, Model, and Evaluation. ECCV 20. [Paper] [Code & Data]
Table-LLaVA, Multimodal Table Understanding. ACL 24. [Paper] [Code] [Model]
TableLVM, TableVLM: Multi-modal Pre-training for Table Structure Recognition. ACL 23. [Paper]
PixT3, PixT3: Pixel-based Table-To-Text Generation. ACL 24. [Paper]

Representation

Tabular representation, noisy operators, and impacts on table structure understanding tasks in LLMs. NeurIPS 2023 second table representation learning workshop. [Paper]
SpreadsheetLLM, SpreadsheetLLM: Encoding Spreadsheets for Large Language Models. arXiv 24. [Paper]
Enhancing Text-to-SQL Capabilities of Large Language Models: A Study on Prompt Design Strategies. EMNLP 23. [Paper] [Code]
Tables as Texts or Images: Evaluating the Table Reasoning Ability of LLMs and MLLMs. arXiv 24. [Paper]

Prompting

NL2SQL

The Dawn of Natural Language to SQL: Are We Fully Ready? VLDB 24. [Paper] [Code]
MCS-SQL, MCS-SQL: Leveraging Multiple Prompts and Multiple-Choice Selection For Text-to-SQL Generation. [Paper]
DIN-SQL, DIN-SQL: Decomposed In-Context Learning of Text-to-SQL with Self-Correction Prompting, Decompose. NeurIPS 23. [Paper] [Code]
DAIL-SQL, Text-to-SQL Empowered by Large Language Models: A Benchmark Evaluation. VLDB 24. [Paper] [Code]
C3, C3: Zero-shot Text-to-SQL with ChatGPT. arXiv 24. [Paper] [Code]

Table QA

Dater, Large Language Models are Versatile Decomposers: Decompose Evidence and Questions for Table-based Reasoning. SIGIR 23. [Paper] [Code]
Binder, Binding language models in symbolic languages. ICLR 23. [Paper] [Code]
ReAcTable, ReAcTable: Enhancing ReAct for Table Question Answering. VLDB 24. [Paper] [Code]
E5, E5: Zero-shot Hierarchical Table Analysis using Augmented LLMs via Explain, Extract, Execute, Exhibit and Extrapolate. NAACL 24. [Paper] [Code]
Chain-of-Table, Chain-of-Table: Evolving Tables in the Reasoning Chain for Table Understanding. ICLR 24. [Paper]
ITR, An Inner Table Retriever for Robust Table Question Answering. ACL 23. [Paper]
LI-RAGE, LI-RAGE: Late Interaction Retrieval Augmented Generation with Explicit Signals for Open-Domain Table Question Answering. ACL 23. [Paper]

Spreadsheet

SheetCopilot, SheetCopilot: Bringing Software Productivity to the Next Level through Large Language Models Agent. NeurIPS 23. [Paper] [Code]
SheetAgent, SheetAgent: A Generalist Agent for Spreadsheet Reasoning and Manipulation via Large Language Models. arXiv 24. [Paper]
Vision Language Models for Spreadsheet Understanding: Challenges and Opportunities. arXiv 24. [Paper]

Multi-task Framework

StructGPT, StructGPT: A General Framework for Large Language Model to Reason over Structured Data. EMNLP 23 Main. [Paper] [Code]
TAP4LLM, TAP4LLM: Table Provider on Sampling, Augmenting, and Packing Semi-structured Data for Large Language Model Reasoning. arXiv 23. [Paper]
UniDM, UniDM: A Unified Framework for Data Manipulation with Large Language Models. MLSys 24. [Paper]
Data-Copilot, Data-Copilot: Bridging Billions of Data and Humans with Autonomous Workflow. arXiv 23. [Paper] [Code]

Tools

LlamaIndex
PandasAI
Vanna
DB-GPT. DB-GPT: Empowering Database Interactions with Private Large Language Models. [Paper] [Code]
RetClean. RetClean: Retrieval-Based Data Cleaning Using Foundation Models and Data Lakes. [Paper] [Code]

Survey

A Survey of Large Language Models. [Paper]
A Survey on Large Language Model Based Autonomous Agents. [Paper]
Table Pre-training: A Survey on Model Architectures, Pre-training Objectives, and Downstream Tasks. [Paper]
Transformers for tabular data representation: A survey of models and applications. [Paper]
A Survey of Table Reasoning with
Large Language Models. [Paper]
A survey on table question answering: Recent advances. [Paper]
Large Language Models(LLMs) on Tabular Data – A Survey. [Paper]
A Survey on Text-to-SQL Parsing: Concepts, Methods, and Future Directions. [Paper]

📊 Datasets & Benchmarks

Benchmarks

Name	Keywords	Artifact	Paper
MBPP	Code	link	arXiv 21
HumanEval	Code	link	arXiv 21
Dr.Spider	NL2SQL, Robustness	link	ICLR 23
WiKiTableQuestions	Table QA	link	ACL 15
WiKiSQL	Table QA,NL2SQL	link	arXiv 17
TabFact	Table Fact Verification	link	ICLR 20
HyBirdQA	Table QA	link	EMNLP 20
FetaQA	Table Fact Verification	link	TACL 22
RobuT	Table QA	link	ACL 23
AnaMeta	Table Metadata	link	ACL 23
GPT4Table	Table QA, Table-to-text	link	WSDM 24
ToTTo	Table-to-text	link	EMNLP 20
SpreadsheetBench	Spreadsheet Manipulation	link	NeurIPS 24
BIRD	NL2SQL	link	NeurIPS 23
Spider	NL2SQL	link	EMNLP 18
Dr.Spider	NL2SQL	link	ICLR 23
ScienceBenchmark	NL2SQL	link	VLDB 24
DS-1000	Data Analysis	link	ICML 23
InfiAgent-DABench	Data Analysis	link	ICML 24
TableBank	Table Detection	link	LERC 20
PubTabNet	Table Extraction	link	ECCV 20
ComTQA	Visual Table QA, Table Detection, Table Extraction	link	arXiv 24

Datasets

Name	Keywords	Artifact	Paper
TableInstruct	Table Instruction Tuning	link	arXiv 23
WDC	Web Table	link	WWW 16
GitTables	GitHub CSVs	link	SIGMOD 23
DART	Table-to-text	link	NAACL 21
MMTab	Multimodal Table Understanding	link	ACL 24
SchemaPile	Database Schemas	link	SIGMOD 24

sent is a simple plaintext presentation tool.

sent does not need latex, libreoffice or any other fancy file format, it uses
plaintext files and png images. Every paragraph represents a slide in the
presentation.

The presentation is displayed in a simple X11 window. The content of each slide
is automatically scaled to fit the window and centered so you also don’t have to
worry about alignment. Instead you can really concentrate on the content.

Demo

To get a little demo, just type

make && ./sent example

You can navigate with the arrow keys and quit with q.

Usage

sent FILE1 [FILE2 ...]

If one FILE equals -, stdin will be read. Produce image slides by prepending a
@ in front of the filename as a single paragraph. Lines starting with # will
be ignored. A \ at the beginning of the line escapes @ and #. A
presentation file could look like this:

sent

@nyan.png

depends on
- Xlib
- libpng

sent FILENAME
one slide per paragraph
# This is a comment and will not be part of the presentation
\# This and the next line start with backslashes

\@FILE.png

thanks / questions?

heimdall

Heimdall is a self-hosted email alias/forwarding service. I built this as a privacy tool to fight spam and also better manage access to my personal email address. As a self-hosted and self-managed service, you have complete control over your data. With 3rd party email forwarding services, you are forced to trust a company with your emails.

This has also been a really fun project for me to learn more about AWS and the Serverless framework.

Check out: How I built Heimdall, an open-source personal email guardian.

Changelog can be found under Releases.

Motivations

With Heimdall, you completely own and manage your data and the service. No feature limitations or having to trust a third-party company with your data.
Heimdall is meant for individual users to deploy and use and contains user-friendly setup instructions.
Heimdall is easy to run – it utilizes the idea of serverless computing, so there is zero server configuration or provisioning.
Heimdall is easy to deploy – it uses the Serverless framework (not to be confused with small-letter serverless in Point 3 above) so you can deploy with a single command.

Features

Overview

Receive safely: Receive emails on single-use aliases and forward them to your personal inbox.
Reply anonymously: Reply to emails from your alias without revealing your personal email address.
Attachments: Attachments are supported on incoming and outgoing emails (subject to size limits – see below).
Email commands: Manage your aliases through email directly – no separate app or website required.
Usage stats: Easily check the usage stats of each alias.

Receiving emails

Heimdall operates as a whitelisting (default-deny) service. All incoming emails to your domain are rejected by default unless they are to valid aliases. Emails received on valid aliases will be forwarded to your personal email address.

Forwarded emails will preserve metadata information, such as any other recipients in the “to” or “CC” headers.

Replying

To reply, simply reply normally to the received email. Other recipients in the original email will not receive your reply.

You may include other recipients in the “to” and “CC” list, either by manually inserting them, or using “reply-all”.

Note: If you do that, you will disclose your email address to them. However, the original sender will still not be able to see your email address, provided you are replying to the original sender through the alias. The original sender will also not be able to see the other recipients.

Attachments

Attachments are supported, although size limits apply to the entire email message. This is a hard limitation imposed by AWS and cannot be circumvented. See Limitations below.

Commands

To interact with the service, send a single email to one of the following email addresses.

Generate an alias

Email generate@yourverifieddomain.com with the description as the subject. You will receive the generated alias as a reply.

The description lets you identify an alias and its use. E.g. “Sign up for Service X”.

List aliases

Email list@yourverifieddomain.com. You will receive a list of all aliases as a reply.

Dev note: This reads up to a maximum of 1MB of data (due to AWS’s limitations).

Remove an alias

Email remove@yourverifieddomain.com with the alias as the title (case-sensitive). You will receive the operation outcome (success/failure) as a reply.

Usage stats

Email info@yourverifieddomain.com with the alias as the title (case-sensitive). You will receive usage information for the particular alias.

Supported usage stats:

Alias creation date
Emails received
Emails sent
Date of last received email
Date of last sent email

Update an alias

Coming soon – not supported yet.

Known Limitations

Received emails must be <30MB. Outgoing emails must be <10MB.

Setup

Pre-requisites: You need to own a domain and have an AWS account. For reasonable use cases, you should not exceed AWS’s free tier (which is very generous). You should also already have Yarn and NodeJS installed.

Optional: To be able to reply to emails, you need to request AWS Support to un-sandbox your SES account.

Add and verify your domain in AWS Simple Email Service (SES).
In AWS’s SES console, generate a set of SMTP credentials. Take note of that, and also your connection information on SES’s “SMTP Settings” page.
Populate required environment variables in .env.sample, and rename to .env. It is important that EMAIL matches your personal email exactly. Also note that you should avoid port 25, due to AWS’s default blocking of outbound traffic.
Run yarn global add serverless. Then, check out Serverless’s guide to set up Serverless’s credentials for accessing your AWS account programmatically.
Run yarn install.
Set up Serverless, then run yarn run deploy-prod.
Add a receipt rule in SES to trigger your S3 bucket (created in step 6). For “recipients”, enter your domain name (e.g. yourverifieddomain.com). Preferably, name your rule descriptively (e.g. prod).

Development (optional)

If you want to build new features or tweak existing features, you can set up a parallel development environment that runs alongside production (above).

Ensure that the DEV_SUBDOMAIN environment variable is set in .env (e.g. test).
Run yarn run deploy-dev. This creates a parallel development CloudFormation stack.
Add a new receipt rule in SES before your production rule to trigger your development S3 bucket. For “recipients”, enter the same test subdomain as you set in step 1 (e.g. test.yourverifieddomain.com). Preferably, name your rule descriptively (e.g. dev).

Note: You need to update your DNS records for test.yourverifieddomain.com as you did when verifying your domain for AWS SES.

Migration

To run migration scripts, first compile using tsc scripts/migrate_vX.ts, then run using node scripts/migrate_vX.js.

Whitespacy

Whitespacy is a polyglot formatter, written in Python, for the C and Whitespace programming languages.

It takes as input a valid C file and a valid Whitespace file, and produces, as output, a polyglot file that is valid in both C and Whitespace, while behaving exaclty like the inputs when interpreted/compiled.

Whitespacy also includes minic.py, a simple C-minifier.

But why ?

The goal of the project was to demonstrate that it is possible to embbed a fully functionnal Whitespace program within the whitespace characters ( , \t and \n) of a program written in another language.

Is it useless ? For sure. Is it trivial ? Hell no.

Dependencies

Whitespacy only uses the standard libraries of Python. However, if you wish to compile the C files, you will need a C compiler like gcc or clang.

To interpret the Whitespace files, I have used an online Whitespace interpreter.

Example

Let’s take as inputs this (nice) “Hello, World!” C program

#include <stdio.h>

#define NICE 69420

int isNice(int x) {
    return x == NICE;
}

/* tricky quote " */
#define min(x, y) \
((x) < (y) ? (x) : (y))

int main() {
    printf("Hello, World!\n");

    if (isNice(3 * 4 * 5 * min(13, 31) * 89))
        printf("nice.\n");

    /* tricky //
       string */
    if (0)
        printf("/* */ \" // \
        ");

    return 0; // no error
}

and a basic “Hello, World!” Whitespace program (see hello-world.ws).

Then, running the command

$ python whitespacy.py hello-world.c hello-world.ws -o polyglot.c

produces the polyglot.c file

 #             include<stdio.h>
    
#  define            NICE       69420
    
 int isNice(int x)      {            return x==NICE;}
    
# define min(x , y)( (x)<(y )?      (x):(y)   )
    
                            
    
                  
    
int main( )   {   printf("Hello,\x20World!\n" )  
    
  ;if(isNice(3*                 4*5*        
min(13,31   
  ) *89)) printf("nice.\n")     ;if  (          0   )printf
    ("/*\x20*/\x20\"\x20//\x20\x20\x20\x20\x20\x20\x20\x20\x20")
 ; return 0                  
    ;
                      
    
                  
    
                
    
  


}

which, can be compiled with gcc (clang)

$ gcc polyglot.c -o polyglot
$ ./polyglot
Hello, World!
nice.

or interpreted in Whitespace:

Hello, world!

It should be noted that the output of whitespacy.py is different for each execution, as (part of) the formatting is randomly generated.

📞 Myanmar Phone Number Validator 🇲🇲

Validate and decode Myanmar phone numbers with ease using this TypeScript library! It’s an evolution of the original JavaScript library by Kaung Myat Lwin, now enhanced to fully support TypeScript. 🚀

Installation 📦

To install this package, simply run:

npm install myanmar-phone-number-validator

Usage 🛠️

This package offers a myanmarPhoneNumber object packed with helpful functions:

isValidMMPhoneNumber(phoneNumber: string): boolean: Verifies if a string is a valid Myanmar phone number, returning true for valid and false for invalid numbers.

import { myanmarPhoneNumber } from 'myanmar-phone-number-validator';

const phoneNumber = '0949880111';
if (myanmarPhoneNumber.isValidMMPhoneNumber(phoneNumber)) {
    // It's a valid phone number!
} else {
    // Oops, invalid phone number!
}

getTelecomName(phoneNumber: string): string: Retrieves the name of the telecom operator associated with a phone number, or “Unknown” if it can’t be determined.

import { myanmarPhoneNumber } from 'myanmar-phone-number-validator';

const phoneNumber = '0949880111';
const telecomName = myanmarPhoneNumber.getTelecomName(phoneNumber);

getPhoneNetworkType(phoneNumber: string): string: Determines the network type of a phone number, returning “Unknown” if it can’t be determined.

import { myanmarPhoneNumber } from 'myanmar-phone-number-validator';

const phoneNumber = '0949880111';
const networkType = myanmarPhoneNumber.getPhoneNetworkType(phoneNumber);

License 📜

This project operates under the MIT License.

Credit 🙌

Huge thanks to Kaung Myat Lwin for creating the original JavaScript library that inspired this one! 👏

Sentiment Analysis on Puccini letters

Project 7

Puccini by mail

In collaboration with the Ricordi Archive, Dr. Patrizia Rebulla and Valeria Luti

In 2024 will be the centenary of the death of Giacomo Puccini, one of the greatest authors of Casa Ricordi.
The great interest of Puccini is not only linked to his universal notoriety nor to the constant and still current success of his works, but also to his language, rich in Tuscanisms and inventions. The letters of Puccini can be studied from the perspective of sentiment analysis in order to link the text to several aspects of the temperament of Puccini, the moments of depression and discouragement from which he suffered periodically, his insecurity about his own abilities, certain difficulties in the relationship with his librettists.

The Ricordi Archive keeps 381 letters written by Puccini to various recipients of Casa Ricordi, and 1387 letters sent to him by the publishing house. To these are added another 120/130 letters present in the database but not kept in the archive. In total, therefore, it is about 2000 letters to be analyzed.

The project aims at studying the letters with aspect based sentiment analysis techniques in order to extract not only the general sentiment polarity but also specific aspects and opinions that may be associated with the events of the life of Giacomo Puccini.

Dataset

Provided by the Ricordi Archive link

Project

The aim of the project is the retrieval of Giacomo Puccini letters from official website, developing and application of pre-trained models for letters sentiment polarity classification over the years.
Dataset Links: Archivio Ricordi and Sentipolc-evalita16

Different models applied :

Simple Neural Network models
SentITA models

Dockerized script to bulk optimize images using libvips / sharp / bun.

Modes

overwrite: Overwrite existing images (default). Scans the directory mounted to /images.

docker run --rm -v ./images:/images -v ./backup:/backup henrygd/optimize

restore: Restore original images from backup (reverses last overwrite operation).

docker run --rm -v ./images:/images -v ./backup:/backup -e MODE=restore henrygd/optimize

copy: Write images to different directory. This example converts all images to WEBP.

docker run --rm -v ./images:/images -v ./optimized:/optimized -e MODE=copy -e FORMAT=webp henrygd/optimize

Environment Variables

Name	Mode	Description	Default
EXTENSIONS	*	Extensions to optimize¹	jpg,jpeg,png,webp,tif,tiff
FIT	*	Fit method	inside
FORMAT	copy	Output format²	unset
JOBS	*	Number of parallel conversion jobs	Based on available CPU cores³
MAX_AGE	*	Age threshold in hours⁴	unset
MAX_HEIGHT	*	Max height of output image	4000
MAX_WIDTH	*	Max width of output image	4000
MIN_SIZE	*	Size threshold in kilobytes⁵	unset
MODE	*	Mode	overwrite
OWNER	*	Ownership of new files⁶	root:root
QUALITY	*	Output quality	80
QUIET	*	Log only errors, not every file	unset

Fit Methods

inside: Preserving aspect ratio, resize the image to be as large as possible while ensuring its dimensions are less than or equal to both those specified.
cover: Crop to cover both provided dimensions.
contain: Embed within both provided dimensions.
fill: Ignore the aspect ratio of the input and stretch to both provided dimensions.
outside: Preserving aspect ratio, resize the image to be as small as possible while ensuring its dimensions are greater than or equal to both those specified.

Uppercase versions of extensions are added automatically. ↩
This will force all optimized images to be converted to the specified format. Possible values: webp, avif. ↩
Default JOBS value is one fewer than half of your available cores. If you have 16 cores, it’s 7 jobs. If you have 4 cores or fewer, it’s only one job. ↩
Images are only optimized if its file content was modififed in the last MAX_AGE hours. For example, 24 would only optimize images updated in the last 24 hours. ↩
Images are only optimized if they are larger than MIN_SIZE. For example, 800 would only optimize images larger than 800kB. ↩
This applies only to newly created files. Overwritten files should maintain existing permissions. Value should use IDs. For example: -e OWNER=1000:1000, or -e OWNER="$(id -u):$(id -g)". ↩