ejolson wrote: ↑Sat May 27, 2023 4:22 pm
How hard would it be to use ICU instead of uftcpp?
A lot harder than it should be. The ICU C/C++ API is unnecessarily cumbersome to use (this is not just my opinion,
they state so themselves in the documentation).
Specifically for this application: their iterator classes are not actually
C++ iterators, so they do not compose with the standard library functions and other algorithms, requiring ugly glue code or manual iteration.
They also require the entire input string to be in memory, so you would either need to load it in a buffer yourself, or resort to something like mmap, complicating matters beyond what would be appropriate for a “simple example of a nontrivial program” aimed at beginners.
ejolson wrote: ↑Sat May 27, 2023 6:18 pm
It looks like need to add a library somewhere.
Indeed, here's a minimum viable Conan+CMake project:
CMakeLists.txt
Code: Select all
cmake_minimum_required(VERSION 3.20)
project(utf8)
find_package(utf8cpp REQUIRED)
find_package(fmt REQUIRED)
add_executable(main "main.cpp")
target_compile_options(main PRIVATE -Wall -Wextra) # Bare minimum
target_compile_features(main PRIVATE cxx_std_20)
target_link_libraries(main PRIVATE fmt::fmt utf8cpp)
conanfile.txt
Code: Select all
[requires]
utfcpp/3.2.3
fmt/10.0.0
[generators]
CMakeDeps
CMakeToolchain
main.cpp
Code: Select all
#include <algorithm>
#include <filesystem>
#include <fstream>
namespace fs = std::filesystem;
#include <fmt/core.h> // https://github.com/fmtlib/fmt
#include <utf8.h> // https://github.com/nemtrif/utfcpp
bool contains_valid_utf8(const fs::path &path) {
std::ifstream file{path, std::ios::binary};
if (!file)
return false;
return utf8::is_valid(std::istreambuf_iterator(file), {});
}
int main() try {
std::ptrdiff_t total = 0;
fs::recursive_directory_iterator dir_it{fs::current_path()};
auto utf8_count = std::ranges::count_if(dir_it, [&](const auto &entry) {
if (!entry.is_regular_file())
return false;
++total;
return contains_valid_utf8(entry.path());
});
auto ratio = static_cast<double>(utf8_count) / static_cast<double>(total);
fmt::print("Fido's UTF-8 popularity statistics:\n\n");
fmt::print("\tTotal files: {}\n\tUTF-8 files: {}\n", total, utf8_count);
fmt::print("\tUTF-8/total: {}%\n", 100.0 * ratio);
} catch (std::exception &e) {
fmt::print(stderr, "Uncaught exception: {}\n", e.what());
return 1;
} catch (...) {
fmt::print(stderr, "Uncaught error\n");
return 1;
}
Then you can just build it using the standard Conan and CMake workflows:
Code: Select all
conan install . --output-folder=build --build=missing
cmake -S. -Bbuild --toolchain=build/conan_toolchain.cmake -G "Ninja Multi-Config"
cmake --build build --config Release -j
./build/Release/main