Error handling discussion


(Thomas Schaller) #1

This is a thread to discuss the general error handling story.

The ideal

  • Most functions return type-safe errors that can be easily matched and handled
  • There are errors that sum multiple of those smaller error types and can be created from them (example: IoError and FormatError could both be errors on their own and variants of an AssetError)

failure

As discussed previously, failure is no longer necessary since std's error will be adapted to support the same features as Fail. However, failure made it easy to create new error types. That’s why I created err-derive and propose we use it in Amethyst to create our errors.

In addition to that, the dynamic error type is implemented by this PR. Those two things together should be everything we need.


#2

In ideal solution it would be possible to create sum error type that doesn’t wrap other enums into new enum like this:

enum Foo {
  Bar,
  Baz,
}

enum Qwe {
  Rty,
  Uiop,
}

enum Sum {
  Foo(Foo),
  Qwe(Qwe),
}

but makes new enum flat.

enum Sum {
  Bar,
  Baz,
  Rty,
  Uiop,
}

with possibility to remove some variants because they are handled by code that returns Result<T, Sum>.

enum Sum {
  Bar,
  Baz,
  // Rty, Never occurs.
  Uiop,
}

fn foo(T) -> Result<U, Foo> { ... }
fn qwe(U) -> Result<Y, Qwe> { ... }

fn sum(t: T) -> Result<Y, Sum> {
  let u = foo(t)?;
  qwe(u).map_err(|err| match err {
    Qwe::Rty => unreachable!("This error can't happen here"),
    Qwe::Uiop => Sum::Uiop,
  })
  // `<Sum as From<Qwe>>::from` could do the same.
}

Right now defining errors with such relation requires writing everything manually.


(Thomas Schaller) #3

Ah I understand. I agree this might make sense, but I think it’s outside of the scope here.

One thing I want to say though is that this pattern can be avoided if you expose the intermediate results of your crate.

For example:

  • foo may raise FooError
  • bar may raise BarError or FooError because it calls foo

If it’s possible to run foo and pass its result to bar such that bar can no longer raise FooError, the problem is solved. Of course, this does not always work, but it’s what I’ve found useful in the past.


#4

Methods foo and bar are from dependencies. You can’t change them.


(Thomas Schaller) #5

@udoprog mentioned on Discord that err-derive only solves half of the problem. Were you referring to amethyst_error, or to backtraces?


(Thomas Schaller) #6

Um, so what’s the function you define? :sweat_smile:

I just called the function we define bar. Whatever you call it, require the user to pass the results of other functions if that is compatible with your API. That way, you don’t have to wrap them in your own error enum.


#7

foo and bar are functions from dependencies.
You define sum that uses foo and bar in the way that guarantee that some error cases in bar never occur.
If user would call foo it will break this guarantee, it was the main purpose of the sum function.

I’ll give you real example.

rendy::Factory::create_buffer uses gfx_hal::Device::allocate_memory, gfx_hal::Device::create_buffer and gfx_hal::Device::bind_buffer_memory functions internally.
The way rendy::Factory::create_buffer is written guarantees that gfx_hal::Device::bind_buffer_memory cannot fail due to wrong memory of out of bound access, but still can due to OOM.

In this particular case I can match and extract OOM error type from gfx_hal::BindError as it is already convertible to factory error. But this is already more code than simple ?.

Also wrapping errors one in another creates duplicates of common errors. For instance almost all error types in gfx-hal has OOM variant. So when you make an error type out of few gfx-hal error types you get OOM variant multiple times.
MyError::BindError(gfx_hal::BindError::OOM(oom)) and MyError::CreationError(gfx_hal::CreationError::OOM(oom)) are the same thing - it’s MyError that is caused by OOM. Ideally they should collapse into single MyError::OOM variant.


because I’m lazy rendy actually returns failure::Error everywhere.


(John John Tedro) #8

Backtraces and causes, yes. To get these we are today required to “wrap” most errors in some way because stable std::error::Error doesn’t have these yet.

amethyst_error is a compromise to generically wrap errors with backtraces and causes, and address compositional error handling cases. Like when you have a trait that is expected to be implemented by dependent crates:

trait Something {
    fn something(&self) -> Result<(), Error>;
}

Note that eventually I’m hoping all these cases can be phased out with something like:

trait Something {
    type Err: std::error::Error;
    
    fn something(&self) -> Result<(), Self::Err>;
}

After which each implementer can provide local error types which can be generated through something like the derive-err crate. But this should wait on a fully functional std::error::Error, and is a large undertaking. This is why I’d prefer an intermediate step using amethyst_error to start implementing more error chains today.

But the intent is fully to phase this out, once stable errors are available. I’d still expect some local error types to at least intermittently implement escape hatches like the one seen in amethyst_error today (Box<Error>).


(Thomas Schaller) #9

Err, I’m certain std Errors support causes, because that’s what the derive does and what I use in my crates :sweat_smile:


(John John Tedro) #10

Aha, yeah. Downcasting is really important for providing good diagnostics with an error chain, and I misremembered that source is now stable.

But I’m also considering about the frontend part when composing causes. with_context, chain_err, etc… Plumbing that simplifies generically putting together new errors with causes. For this we’d need something like a stable Context type and a blanket implementation for Result<T, E>. But I’m not sure what is the right for for std.


(Richard Dodd (Dodj)) #11

I’ve tried to do this before - happy to help out with any current efforts. I’ve bought into the idea of using std::error::Error and I think having a macro to generate the display impls etc is very sensible.


(Lucio Franco) #12

I thought cause got deprecated, at least that’s what I keep seeing in tokio.

I feel this will be a lot easier once we have existential types.

Now I have to ask how many times do we actually use this pattern?

This makes a lot more sense now that I read this when I’m awake. Though I worry that we are putting in a double effort for this may be.

That said, I have spent some time thinking about this and I’m a lot more ok with it than I was before. I can give the PR another pass if you’d like @udoprog.


(Thomas Schaller) #13

Got a solution in mind for your use case. Let me implement it this afternoon and move our side discussion out then.


(Richard Dodd (Dodj)) #14

Moved from github.

I’m just going to write a bit of a blurb about how I see error handling in rust atm. It’s fairly subjective, and there are plenty of things to argue with.

Rust has some great abstractions for handling failure situations, where some I/O or something else hasn’t worked out. You have 3 choices when something hasn’t worked.

  1. Ignore the failure and continue anyway
  2. (If you’re in a function) return a Result<T, E> instead of T
    1. the type E implements std::error::Error
    2. the type E has no constraints
  3. panic!

I would try to avoid (1) at all costs. For me, there are some instances of this in amethyst - I think there are systems that just do nothing if some resource or component isn’t present that should be. This indicates a programmer error and they should panic (with a message explaining the problem).

It’s also not true IMO that panicking is strictly worse than Result, sometimes it is more appropriate (where there is mis-configuration or programmer error), as long as the invariants required to avoid it are well documented.

If you’re using a Result, then you need to decide what the error type will be. I would argue that std::error::Error is really only for errors that will be reported to a person as text. Its main functionality is that it can be printed (using Display) and can contain another error (using source). If you’re not using this functionality, you’re essentially implementing Error just as a marker, which might not be worth doing. Any error you’re going to recover from is probably not worth implementing Error for.

This brings me to my next point - sometimes it’s OK for errors to be strings of text: specifically when the error is unrecoverable and only going to be consumed by a human. Say you have a function that can error, and there is only 1 way you can error. You’ll probably do something like the following

#[derive(Debug)]
struct MyError;

impl Display for MyError {
    fn fmt(&self, f: &mut Formatter) -> fmt::Result {
        f.write_str("this is the error message")
    }
}

That’s a lot of lines to just give a string error, and since there is only one way it can fail, a machine can just ignore the value of the error.

Of course there are gray areas, mostly when you have an error that you may recover from. For example, if a texture isn’t available, you might want to continue and display an error message in-game, because that helps modders more than crashing out. In that case you would use an enum error, and implement Display + Error as a courtesy, so the game developer can decide what to do.

Another issue is how to deal with io::Error, and it demonstrates the value, and limitations, of error chains. It may be that there are some kinds I expect more - maybe that a file is missing, or that I don’t have permission to write a file I need to. You can test for these kinds and convert them to your own error kind, but if you encounter an error (say InvalidData for example) that you didn’t expect, you want to give the user as much information as possible. On the other hand, all the information you got from the OS was invalid data that’s not going to be much more use than an unexpected error occurred.

Summary

  1. Avoid failing silently, a.k.a. allowing code paths that you haven’t really thought about, and where you don’t know what’s going to happen.
  2. Use panic! to indicate to a developer using your crate that he has not upheld some invariant you insist upon.
  3. Use Result<T, E> for things that might fail otherwise.
  4. Always implement std::error::Error when an error might be used in an error chain.
  5. Maybe implement std::error::Error at other times (it’s not 100% essential, and you’re basically just doing it for the Display impl and the marker).
  6. Keep the happy path fast and lean.
  7. Box your error if it contains a lot of data. Result is as wide as its widest variant, and you’re only going to pay the cost of indirection once something has already gone wrong (i.e. performance probably isn’t so important any more). I think it’s perfectly acceptable to return Result<T, Box<MyError>> when you want to keep the size down. As always with performance, benchmark. :slight_smile:
  8. Trait objects make everything more complicated. You start having issues with Send/Sync/'static. Most of the time you don’t need them, you can just use cause: ConcreteInnerErrorType. The only time you use dyn Error (+ 'static + Send + Sync + ...) is when you’re implementing source.
  9. Sometimes it makes sense to wrap the underlying error (e.g. IO error), sometimes it makes sense to subsume it (e.g. unit struct error). Use your judgment.
  10. Don’t be afraid of just implementing the trait in the way that makes sense for your error type. You don’t need to use a procedural macro and shouldn’t unless it’s literally just to save some boilerplate.

Example

use std::{
    error::Error,
    fmt,
    fs::File,
    io,
    path::{Path, PathBuf},
};

pub fn load_image(image_path: &str) -> Result<Image, MyError> {
    let mut img_src =
        File::open(image_path).map_err(|e| MyError::error_opening_file(image_path, e))?;
    let img = process_image(&mut img_src).map_err(|e| MyError::error_in_lib(image_path, e))?;
    Ok(img)
}

// Error types

/// The error type for this function. Could be re-used if there are other functions with exactly
/// the same error possibilities.
#[derive(Debug, Clone)]
pub struct MyError {
    // These could be public, but using accessor methods leaves open the possibilities of
    // reorganising things in the future.
    /// The path to the image we tried to load
    image_path: PathBuf,
    /// The type of error we encountered
    kind: MyErrorKind,
}

#[derive(Debug)]
pub enum MyErrorKind {
    /// Error opening file stream.
    OpeningFile(io::Error),
    // here we've chosen to consume the unerlying error. We could decide to wrap it, but I think
    // that would be more complicated.
    /// Error reading data from the file stream.
    ReadingData(io::Error),
    /// The magic bytes were wrong
    Magic([u8; 4]),
    /// Data in the stream was invalid
    Invalid(usize, Option<String>),
    /// Some error we didn't expect.
    // By capturing the error this way, we can still use `Display` to print out anything in it.
    // It's the only time you want to be storing it as a trait object.
    UnknownLib(Box<dyn Error + Send + Sync>),
}

// Impls

impl MyError {
    /// What type of error occurred.
    pub fn kind(&self) -> &MyErrorKind {
        &self.kind
    }

    /// The image file we were trying to work with when this error occurred.
    pub fn image_path(&self) -> &Path {
        self.image_path.as_ref()
    }

    /// Wrap an io::Error the way we want. Helper method
    fn error_opening_file(image_path: impl Into<PathBuf>, e: io::Error) -> Self {
        let kind = MyErrorKind::OpeningFile(e); // io::Error isn't too wide, so no indrection.
        MyError::from_parts(image_path, kind)
    }

    /// Process a library error. Helper method
    fn error_in_lib(image_path: impl Into<PathBuf>, e: ImageError) -> Self {
        let kind = match e {
            ImageError::Io(e) => MyErrorKind::ReadingData(e),
            ImageError::Magic(bytes) => MyErrorKind::Magic(bytes),
            ImageError::Invalid(pos, msg) => MyErrorKind::Invalid(pos, msg),
            e => MyErrorKind::UnknownLib(Box::new(e) as Box<dyn Error + Send + Sync>),
        };
        MyError::from_parts(image_path, kind)
    }

    /// Boilerplate for creating the struct. Helper method
    fn from_parts(image_path: impl Into<PathBuf>, kind: MyErrorKind) -> Self {
        MyError {
            image_path: image_path.into(),
            kind,
        }
    }
}

impl fmt::Display for MyError {
    fn fmt(&self, f: &mut fmt::Formatter) -> fmt::Result {
        let image_path = self.image_path.display();
        match self.kind() {
            MyErrorKind::OpeningFile(e) => {
                let cannot_open = format!("cannot open file at \"{}\" as an image because", image_path);
                match e.kind() {
                    io::ErrorKind::NotFound => write!(f, "{} it could not be found", cannot_open),
                    io::ErrorKind::PermissionDenied => write!(f, "{} this process doesn't have permission to read it", cannot_open),
                    _ => write!(f, "{} of an unexpected i/o error: {}", cannot_open, e),
                }
            }
            // If we've already opened the file - something wierd is happening if we can't read it,
            // so just print the i/o error:
            MyErrorKind::ReadingData(e) => write!(f, "cannot read from file at \"{}\" because of an unexpected i/o error: {}", image_path, e),
            MyErrorKind::Magic(bytes) => write!(f, "while reading image at \"{}\", expected the magic bytes [1, 2, 3, 4] but found {:?}", image_path, bytes),
            MyErrorKind::Invalid(pos, msg) => {
                let invalid_offset = format!("image at \"{}\" has an invalid byte at offset {}", image_path, pos);
                match msg {
                    Some(msg) => write!(f, "{}: {}", invalid_offset, msg),
                    None => write!(f, "{}", invalid_offset),
                }
            }
            MyErrorKind::UnknownLib(e) => write!(
                f,
                "while decoding image at \"{}\", received the error \"{}\"",
                self.image_path.display(), e
            ),
        }
    }
}

// We can't derive this because io::Error doesn't implement clone. We're just going to discard any
// inner error here.
impl Clone for MyErrorKind {
    fn clone(&self) -> MyErrorKind {
        match self {
            // Discard what we have to to make this cloneable.
            MyErrorKind::OpeningFile(e) => MyErrorKind::OpeningFile(e.kind().into()),
            MyErrorKind::ReadingData(e) => MyErrorKind::ReadingData(e.kind().into()),
            // Next things we can clone
            MyErrorKind::Magic(bytes) => MyErrorKind::Magic(*bytes),
            MyErrorKind::Invalid(pos, msg) => MyErrorKind::Invalid(*pos, msg.clone()),
            // The error might not be clone, so just convert it to text.
            MyErrorKind::UnknownLib(e) => MyErrorKind::UnknownLib(Box::new(StringError::from(
                e.to_string(),
            ))
                as Box<dyn Error + Send + Sync>),
        }
    }
}

// I really think Error should be impl for String
#[derive(Debug, Clone, Eq, PartialEq, Ord, PartialOrd, Hash, Default)]
#[repr(transparent)]
pub struct StringError(String);

impl fmt::Display for StringError {
    fn fmt(&self, f: &mut fmt::Formatter) -> fmt::Result {
        fmt::Display::fmt(&self.0, f)
    }
}

impl Error for StringError {}

impl From<String> for StringError {
    fn from(s: String) -> StringError {
        StringError(s)
    }
}

impl From<StringError> for String {
    fn from(e: StringError) -> String {
        e.0
    }
}

// External

// Imagine ImageError looks like this. It's good to think how you would wrap all kinds of errors,
// since different crate authors will do different things.

// Because ImageError is Send + Sync, changing this would be backwards-incompatible.
// That's why we can use `dyn Error + Send + Sync` above. This isn't always true.
#[derive(Debug)]
pub enum ImageError {
    /// An i/o error occurred.
    Io(io::Error),
    /// The magic 4 bytes were wrong.
    Magic([u8; 4]),
    /// A byte in the file was invalid for its position. Some more information is given as text.
    Invalid(usize, Option<String>),
    #[doc(hidden)]
    __NonExhaustive,
}

impl fmt::Display for ImageError {
    fn fmt(&self, _f: &mut fmt::Formatter) -> fmt::Result {
        unimplemented!()
    }
}

impl Error for ImageError {}

pub fn process_image(_r: &mut impl io::Read) -> Result<Image, ImageError> {
    unimplemented!()
}

pub type Image = ();

(John John Tedro) #15

Nice post!

I’d like to emphasize that we be cautious of panicking by default. And that we strive to provide as “safe” an API as possible and limit panics to internal invariants, similarly to how you wrap unsafe APIs in safe ones.


(Richard Dodd (Dodj)) #16

It’s now always true that panicking is wrong. Consider the following from amethyst source

/// # Panics
///
/// If two system are added that share an identical name, this function will panic.
/// Empty names are permitted, and this function will not panic if more then two are added.

This is the correct behavior, as the crate user has misused the system. We shouldn’t try to recover from this - we could just continue and skip inserting the second System, but this is almost certainly not what the user wants and just turns the runtime error into a logic error (worse).


(John John Tedro) #17

I hear what you are saying, but it’s a stretch to read “I’d like to emphasize that we be cautious of panicking by default” as “panicking is wrong”.

For that particular API there’s IMO no strong reason to panic compared to bailing, but there’s also no strong reason not to since no one has so far needed to work around that particular panic.


(Richard Dodd (Dodj)) #18

I wasn’t disagreeing with what you were saying directly, just saying that there’s a balance to be struck. :slight_smile:


(John John Tedro) #19

@omni-viral We can discuss which backtraces to store here (from: this discussion).

My hesitation to erase intermediate backtraces comes from cases like these:

fn do_foo() -> Result<(), Error> {
    /*  */
}

fn handle_result(res: Result<T, Error>) {
  res.with_context(|_| Error::from_string("some weird error handling"))
}

fn do_something() -> Result<(), Error> {
    let res = do_foo();
    handle_result(res)
}

Note that you usually see this in more complex scenarios. The above is somewhat simplified.

If we erase everything except the root backtrace it’s hard to decode exactly how and where errors are captured. What I would like to do is to compress all backtraces to only include what is relevant for each code path. This is what Java does by default:

javax.servlet.ServletException: Something bad happened
    at com.example.myproject.OpenSessionInViewFilter.doFilter(OpenSessionInViewFilter.java:60)
    at org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1157)
    at com.example.myproject.ExceptionHandlerFilter.doFilter(ExceptionHandlerFilter.java:28)
    at org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1157)
    at com.example.myproject.OutputBufferFilter.doFilter(OutputBufferFilter.java:33)
    at org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1157)
    at org.mortbay.jetty.servlet.ServletHandler.handle(ServletHandler.java:388)
    at org.mortbay.jetty.security.SecurityHandler.handle(SecurityHandler.java:216)
    at org.mortbay.jetty.servlet.SessionHandler.handle(SessionHandler.java:182)
    at org.mortbay.jetty.handler.ContextHandler.handle(ContextHandler.java:765)
    at org.mortbay.jetty.webapp.WebAppContext.handle(WebAppContext.java:418)
    at org.mortbay.jetty.handler.HandlerWrapper.handle(HandlerWrapper.java:152)
    at org.mortbay.jetty.Server.handle(Server.java:326)
    at org.mortbay.jetty.HttpConnection.handleRequest(HttpConnection.java:542)
    at org.mortbay.jetty.HttpConnection$RequestHandler.content(HttpConnection.java:943)
    at org.mortbay.jetty.HttpParser.parseNext(HttpParser.java:756)
    at org.mortbay.jetty.HttpParser.parseAvailable(HttpParser.java:218)
    at org.mortbay.jetty.HttpConnection.handle(HttpConnection.java:404)
    at org.mortbay.jetty.bio.SocketConnector$Connection.run(SocketConnector.java:228)
    at org.mortbay.thread.QueuedThreadPool$PoolThread.run(QueuedThreadPool.java:582)
Caused by: com.example.myproject.MyProjectServletException
    at com.example.myproject.MyServlet.doPost(MyServlet.java:169)
    at javax.servlet.http.HttpServlet.service(HttpServlet.java:727)
    at javax.servlet.http.HttpServlet.service(HttpServlet.java:820)
    at org.mortbay.jetty.servlet.ServletHolder.handle(ServletHolder.java:511)
    at org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1166)
    at com.example.myproject.OpenSessionInViewFilter.doFilter(OpenSessionInViewFilter.java:30)
    ... 27 more
Caused by: org.hibernate.exception.ConstraintViolationException: could not insert: [com.example.myproject.MyEntity]
    at org.hibernate.exception.SQLStateConverter.convert(SQLStateConverter.java:96)
    at org.hibernate.exception.JDBCExceptionHelper.convert(JDBCExceptionHelper.java:66)
    at org.hibernate.id.insert.AbstractSelectingDelegate.performInsert(AbstractSelectingDelegate.java:64)
    at org.hibernate.persister.entity.AbstractEntityPersister.insert(AbstractEntityPersister.java:2329)
    at org.hibernate.persister.entity.AbstractEntityPersister.insert(AbstractEntityPersister.java:2822)
    at org.hibernate.action.EntityIdentityInsertAction.execute(EntityIdentityInsertAction.java:71)
    at org.hibernate.engine.ActionQueue.execute(ActionQueue.java:268)
    at org.hibernate.event.def.AbstractSaveEventListener.performSaveOrReplicate(AbstractSaveEventListener.java:321)
    at org.hibernate.event.def.AbstractSaveEventListener.performSave(AbstractSaveEventListener.java:204)
    at org.hibernate.event.def.AbstractSaveEventListener.saveWithGeneratedId(AbstractSaveEventListener.java:130)
    at org.hibernate.event.def.DefaultSaveOrUpdateEventListener.saveWithGeneratedOrRequestedId(DefaultSaveOrUpdateEventListener.java:210)
    at org.hibernate.event.def.DefaultSaveEventListener.saveWithGeneratedOrRequestedId(DefaultSaveEventListener.java:56)
    at org.hibernate.event.def.DefaultSaveOrUpdateEventListener.entityIsTransient(DefaultSaveOrUpdateEventListener.java:195)
    at org.hibernate.event.def.DefaultSaveEventListener.performSaveOrUpdate(DefaultSaveEventListener.java:50)
    at org.hibernate.event.def.DefaultSaveOrUpdateEventListener.onSaveOrUpdate(DefaultSaveOrUpdateEventListener.java:93)
    at org.hibernate.impl.SessionImpl.fireSave(SessionImpl.java:705)
    at org.hibernate.impl.SessionImpl.save(SessionImpl.java:693)
    at org.hibernate.impl.SessionImpl.save(SessionImpl.java:689)
    at sun.reflect.GeneratedMethodAccessor5.invoke(Unknown Source)
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
    at java.lang.reflect.Method.invoke(Method.java:597)
    at org.hibernate.context.ThreadLocalSessionContext$TransactionProtectionWrapper.invoke(ThreadLocalSessionContext.java:344)
    at $Proxy19.save(Unknown Source)
    at com.example.myproject.MyEntityService.save(MyEntityService.java:59) <-- relevant call (see notes below)
    at com.example.myproject.MyServlet.doPost(MyServlet.java:164)
    ... 32 more
Caused by: java.sql.SQLException: Violation of unique constraint MY_ENTITY_UK_1: duplicate value(s) for column(s) MY_COLUMN in statement [...]
    at org.hsqldb.jdbc.Util.throwError(Unknown Source)
    at org.hsqldb.jdbc.jdbcPreparedStatement.executeUpdate(Unknown Source)
    at com.mchange.v2.c3p0.impl.NewProxyPreparedStatement.executeUpdate(NewProxyPreparedStatement.java:105)
    at org.hibernate.id.insert.AbstractSelectingDelegate.performInsert(AbstractSelectingDelegate.java:57)
    ... 54 more

You can see the ... X more lines below each stack trace. This gives you complete context of where each backtrace was captured and decoded.

Finally, if we have cases which are performance sensitive when bailing, it should be handled using a custom error type.


#20

So you want handle_result to appear in backtrace? I wouldn’t want that as handle_result happened after error occurred. I only want a backtrace to do_foo call and deeper to the point where error did occur.