Character Sets
The zio.nio.charset
package offers an API for ZIO programs to work with character sets, using the Java NIO support for character sets. Any character set supported by your JVM can be used.
Charset
β
This class wraps the java.nio.charset.Charset
class with a ZIO-friendly API. There are convenience methods for:
- encoding/decoding buffers
- encoding/decoding single chunks
- encoding/decoding strings
For more sophisticated encoding/decoding needs, a CharsetEncoder
or CharsetDecoder
can be obtained from a Charset
.
Standard Charsetsβ
The standard set of charsets provided by Java are available in Charset.Standard
.
- utf8
- utf16
- utf16Be
- utf16Le
- usAscii
- iso8859_1
JVMs typically support many more charsets than these; use Charset.availableCharsets
to retrieve the complete list.
Exampleβ
import zio.nio.charset._
import zio.nio.file.Files
import zio.nio.file.Path
val s = "Hello, world!"
for {
utf16Bytes <- Charset.Standard.utf16.encodeString(s)
_ <- Files.writeBytes(Path("utf16.txt"), utf16Bytes)
} yield ()
Stream Encoding and Decodingβ
Using streams instead of buffers or chunks is great for bigger jobs. ZIO Streams comes with a UTF-8 decoder built in, but if you need other character sets, or you need encoding, then ZIO-NIO can helpβas long as you're running on the JVM.
Stream-based encoding and decoding are provided by the transducer
method of the CharsetEncoder
and CharsetDecoder
classes respectively.
import zio.nio.charset._
import zio.nio.channels.FileChannel
import zio.nio.channels._
import zio.nio.file.Path
import zio.stream.ZStream
import zio.blocking.Blocking
import zio.console
import zio.ZIO
// dump a file encoded in ISO8859 to the console
FileChannel.open(Path("iso8859.txt")).useNioBlockingOps { fileOps =>
val inStream: ZStream[Blocking, Exception, Byte] = ZStream.repeatEffectChunkOption {
fileOps.readChunk(1000).asSomeError.flatMap { chunk =>
if (chunk.isEmpty) ZIO.fail(None) else ZIO.succeed(chunk)
}
}
// apply decoding transducer
val charStream: ZStream[Blocking, Exception, Char] =
inStream.transduce(Charset.Standard.iso8859_1.newDecoder.transducer())
console.putStrLn("ISO8859 file contents:") *>
charStream.foreachChunk(chars => console.putStr(chars.mkString))
}