25,257 questions
1
vote
2
answers
175
views
Can font-size be set by unicode range?
I'd like to change the size of the symbols but they cannot be wrapped in additional HTML tags. Is there a way to target a unicode range when setting font-size? Such as,
span.name[...unicode range...] ...
1
vote
1
answer
101
views
How to search and replace unicode characters with a Word macro?
I am trying to replace several Unicode characters in text strings in Microsoft Word.
The issue is when I try to use these text strings in other applications, the Unicode characters convert to a ...
Advice
0
votes
5
replies
84
views
How to get the name of characters made up of multiple codepoints with ICU?
I managed to find u_charName() for getting the name of a single character, but what about characters like flag emojis, which are made up of multiple codepoints? Do characters like that even have names?...
Best practices
0
votes
3
replies
75
views
OCR output contains “garbage” characters after special symbols (mojibake / control chars) — how to reliably clean before returning from LLM?
I have an on-prem OCR pipeline that returns extracted text inside a JSON blob. I parse the LLM response and call a local normalizer before returning the text to callers. Example call site:
result = ...
-4
votes
1
answer
215
views
Getting weird results from java string codepoints on a windows machine [closed]
package edu.practice.zapper;
import java.io.IOException;
import java.lang.ProcessBuilder.Redirect;
import java.nio.charset.Charset;
import java.util.ArrayList;
import java.util.Base64;
import java....
0
votes
1
answer
71
views
Using XSLT3.0 in Saxon-JS 2, how can one configure the processor so that it accepts codepoints-to-string(8)?
Within Saxon-J, I can set the processor configuration to allow XML1.1 characters, for example by:
processor.getUnderlyingConfiguration().setXMLVersion(XML11);
I'm looking for the equivalent in Saxon-...
0
votes
1
answer
156
views
UnicodeEncodeError: 'charmap' codec can't encode characters when writing to HTML
I have a pandas DataFrame that I wish to paste into an HTML document. The DataFrame contains Dingbat characters used as symbols to highlight values as good (checkmark), nearly bad (triangle), or bad (...
0
votes
0
answers
67
views
Is there an equivalent of ICU4J's PersonNameFormatter in ICU4C?
I'm working on a Qt6/C++20 application that needs to handle localization. So far, we've gotten away with using Qt's build in localization; however, we want to add person names to our UI, with proper ...
4
votes
0
answers
218
views
How can I apply tail/tails to a string of text in a Unicode-aware manner?
This "warning sign" character, ⚠️, corresponds to the sequence of codepoints U+26A0 U+FE0F (if I understand correctly, it is ⚠ followed by a variation selector character), so I can render it ...
Best practices
0
votes
2
replies
39
views
Unicode encoding of units of measure
I'm working on a proprietary font that uses custom glyphs for certain units of measure. For example, I want to display pH for acidity as a single glyph, or nm for nanometers, etc. This font is ...
1
vote
1
answer
127
views
In VB.Net, why √(x) is refused as function's name but not π(x)?
The following code is refused with error BC30037: invalid character for the √ character:
Public Function √(x As BigDecimal) As BigDecimal
return BigDecimal.SquareRoot(x, BigDecimal.Precision)
End ...
Advice
1
vote
6
replies
104
views
How to match any amount of Unicode characters (letters, numbers, surrogate pairs) in regex?
How to match any amount of Unicode characters (letters, numbers, surrogate pairs) in regex:
😆
ẘ😆
😆😆
ẘaሴ
abc123
I am looking for an equivalent to /^(\[a-zA-Z0-9\]+$)/i that finds ASCII.
2
votes
2
answers
82
views
Why are spaces being converted to slashes when converting a string to an array buffer?
I have the following function in TypeScript which is taking in a string and converting it over to an ArrayBuffer. The returned ArrayBuffer is then being used elsewhere to create a text file for ...
1
vote
1
answer
233
views
How can I get Unicode output from robocopy in a PowerShell script?
This is a more specific case of How to set Select-String encoding to UTF-16?
Here's a source code example that demonstrates the problem. The output file is gibberish.
$sourcedir = $env:PUBLIC
$destdir ...
1
vote
3
answers
194
views
How to write Unicode string in C?
Here is the link.
I'm trying to write \uxxxx, \Uxxxxxxxx inside the double quoted string:
int main()
{
const char *a1="\u0041";
const char *a2="\U00000041";
return 0;
}
...