Class EnhancedGenericDatumReader

  • All Implemented Interfaces:
    org.apache.avro.io.DatumReader<org.apache.avro.generic.GenericRecord>
    Direct Known Subclasses:
    ArrayAsTroveListGenericDatumReader, MutableArrayElementGenericDatumReader

    public class EnhancedGenericDatumReader
    extends org.apache.avro.generic.GenericDatumReader<org.apache.avro.generic.GenericRecord>
    Extend Avro core implementation of generic datum reader with leverage of EnhancedGenericData (by default), and optional collection/logging of detailed statistics (read objects count) about reader activity.

    This implementation also offers easy customization hooks for subclasses that wish to implement more efficient (typically, memory-wise) reading of scalar of array values. See doReadWithoutConversion(Object, Schema, ResolvingDecoder) and doReadArray(Object, Schema, ResolvingDecoder). Some subclassing implementation may also appreciate the extra reader state information about currentRecordSchema and currentField.

    As with the Avro core philosophy, one may also inject into this reader a custom/extended implementation of GenericData in order to provide extra benefits for the reading process w/o directly sub-classing this implementation. Refer to alternate constructors that allow for injection of a GenericData instance. Note that by default (if not explicitly injecting anything) this reader implementation will already leverage and enhanced implementation of generic data. If (for some strange reason) one wish to instead still use the core Avro implementation, one may use an alternative constructor to inject it.

    Thread Safety - Each reader instance must be used by one thread at a time to read something. However, each reader instance may be used by different threads over time (sequentially). Moreover, this class implements a thread-safe collection of aggregated statistics across multiple reader instances being used concurrently by multiple threads over time.

    • Field Summary

      Fields 
      Modifier and Type Field Description
      protected org.apache.avro.Schema.Field currentField  
      protected org.apache.avro.Schema currentRecordSchema  
      protected static com.activeviam.io.data.source.EnhancedGenericDatumReader.Statistics stats  
      protected boolean withDetailedMonitoring  
    • Constructor Summary

      Constructors 
      Constructor Description
      EnhancedGenericDatumReader​(org.apache.avro.Schema schema)
      Create an enhanced generic datum reader with detailed monitoring disabled.
      EnhancedGenericDatumReader​(org.apache.avro.Schema schema, boolean withDetailedMonitoring)
      Create an enhanced generic datum reader with optional enabling of detailed monitoring.
      EnhancedGenericDatumReader​(org.apache.avro.Schema schema, org.apache.avro.generic.GenericData data)
      Create an enhanced generic datum reader with explicit specification of the GenericData instance to leverage, and with detailed monitoring disabled.
      EnhancedGenericDatumReader​(org.apache.avro.Schema schema, org.apache.avro.generic.GenericData data, boolean withDetailedMonitoring)
      Create an enhanced generic datum reader with explicit specification of the GenericData instance to leverage, and with optional enabling of detailed monitoring.
    • Method Summary

      All Methods Static Methods Instance Methods Concrete Methods 
      Modifier and Type Method Description
      protected Object doReadArray​(Object old, org.apache.avro.Schema expected, org.apache.avro.io.ResolvingDecoder in)  
      protected Object doReadWithoutConversion​(Object old, org.apache.avro.Schema expected, org.apache.avro.io.ResolvingDecoder in)  
      static com.quartetfs.fwk.impl.Pair<Boolean,​org.apache.avro.Schema.Type> extractSimpleArrayElementInfo​(org.apache.avro.Schema arraySchema)
      Extract relevant information for the type of elements supported by a given simple Avro array.
      static com.quartetfs.fwk.impl.Pair<Boolean,​org.apache.avro.Schema.Type> extractSimpleUnionInfo​(org.apache.avro.Schema unionSchema)
      Assume a simple Union Avro type (a.k.a.
      void logStatistics()
      Log the current state of aggregated statistics.
      org.apache.avro.generic.GenericRecord read​(org.apache.avro.generic.GenericRecord reuse, org.apache.avro.io.Decoder in)  
      protected Object readArray​(Object old, org.apache.avro.Schema expected, org.apache.avro.io.ResolvingDecoder in)  
      protected void readField​(Object r, org.apache.avro.Schema.Field f, Object oldDatum, org.apache.avro.io.ResolvingDecoder in, Object state)  
      protected Object readRecord​(Object old, org.apache.avro.Schema expected, org.apache.avro.io.ResolvingDecoder in)  
      protected Object readWithoutConversion​(Object old, org.apache.avro.Schema expected, org.apache.avro.io.ResolvingDecoder in)  
      • Methods inherited from class org.apache.avro.generic.GenericDatumReader

        addToArray, addToMap, convert, createBytes, createEnum, createFixed, createFixed, createString, findStringClass, getData, getExpected, getResolver, getSchema, newArray, newInstanceFromString, newMap, newRecord, peekArray, read, readBytes, readBytes, readEnum, readFixed, readInt, readMap, readMapKey, readString, readString, readWithConversion, setExpected, setSchema, skip
    • Field Detail

      • stats

        protected static com.activeviam.io.data.source.EnhancedGenericDatumReader.Statistics stats
      • withDetailedMonitoring

        protected final boolean withDetailedMonitoring
      • currentRecordSchema

        protected org.apache.avro.Schema currentRecordSchema
      • currentField

        protected org.apache.avro.Schema.Field currentField
    • Constructor Detail

      • EnhancedGenericDatumReader

        public EnhancedGenericDatumReader​(org.apache.avro.Schema schema)
        Create an enhanced generic datum reader with detailed monitoring disabled.
        Parameters:
        schema - The expected schema of Avro records to read.
      • EnhancedGenericDatumReader

        public EnhancedGenericDatumReader​(org.apache.avro.Schema schema,
                                          boolean withDetailedMonitoring)
        Create an enhanced generic datum reader with optional enabling of detailed monitoring.
        Parameters:
        schema - The expected schema of Avro records to read.
        withDetailedMonitoring - Whether to enable detailed monitoring. (not recommended for prod usage)
      • EnhancedGenericDatumReader

        public EnhancedGenericDatumReader​(org.apache.avro.Schema schema,
                                          org.apache.avro.generic.GenericData data)
        Create an enhanced generic datum reader with explicit specification of the GenericData instance to leverage, and with detailed monitoring disabled.
        Parameters:
        schema - The expected schema of Avro records to read.
        data - The generic data instance to leverage.
      • EnhancedGenericDatumReader

        public EnhancedGenericDatumReader​(org.apache.avro.Schema schema,
                                          org.apache.avro.generic.GenericData data,
                                          boolean withDetailedMonitoring)
        Create an enhanced generic datum reader with explicit specification of the GenericData instance to leverage, and with optional enabling of detailed monitoring.
        Parameters:
        schema - The expected schema of Avro records to read.
        data - The generic data instance to leverage.
        withDetailedMonitoring - Whether to enable detailed monitoring. (not recommended for prod usage)
    • Method Detail

      • extractSimpleArrayElementInfo

        public static com.quartetfs.fwk.impl.Pair<Boolean,​org.apache.avro.Schema.Type> extractSimpleArrayElementInfo​(org.apache.avro.Schema arraySchema)
        Extract relevant information for the type of elements supported by a given simple Avro array.

        An simple Avro array is assumed to represent a collection of one and only one non-nullable element type, with optional support of element nullability.

        That is, we may have an array of int elements w/o allowing any element to be null, or an array of int elements but also accepting also null elements, or an array full of null elements... but we cannot have an array that would contain both int and double elements.

        Parameters:
        arraySchema - An Avro schema defining an array type.
        Returns:
        Pair with left = boolean (element nullable or not), and right = non-null element type (if any... or null otherwise).
        Throws:
        RuntimeException - If not a simple array.
        See Also:
        extractSimpleUnionInfo(Schema)
      • extractSimpleUnionInfo

        public static com.quartetfs.fwk.impl.Pair<Boolean,​org.apache.avro.Schema.Type> extractSimpleUnionInfo​(org.apache.avro.Schema unionSchema)
        Assume a simple Union Avro type (a.k.a. no more than one non-null type) and extract relevant information (a.k.a. whether/not nullable, and the non-null type if any).
        Parameters:
        unionSchema - An Avro schema defining a Union type.
        Returns:
        Pair with left = boolean (nullable or not), and right = non-null type (if any... or null otherwise).
        Throws:
        RuntimeException - If not a simple union.
      • logStatistics

        public void logStatistics()
        Log the current state of aggregated statistics.

        NOTICE - This effectively logs something only if detailed monitoring has been enabled (check constructor options), otherwise this is equivalent to a NOOP.

      • read

        public final org.apache.avro.generic.GenericRecord read​(org.apache.avro.generic.GenericRecord reuse,
                                                                org.apache.avro.io.Decoder in)
                                                         throws IOException
        Specified by:
        read in interface org.apache.avro.io.DatumReader<org.apache.avro.generic.GenericRecord>
        Overrides:
        read in class org.apache.avro.generic.GenericDatumReader<org.apache.avro.generic.GenericRecord>
        Throws:
        IOException
      • readRecord

        protected final Object readRecord​(Object old,
                                          org.apache.avro.Schema expected,
                                          org.apache.avro.io.ResolvingDecoder in)
                                   throws IOException
        Overrides:
        readRecord in class org.apache.avro.generic.GenericDatumReader<org.apache.avro.generic.GenericRecord>
        Throws:
        IOException
      • readField

        protected final void readField​(Object r,
                                       org.apache.avro.Schema.Field f,
                                       Object oldDatum,
                                       org.apache.avro.io.ResolvingDecoder in,
                                       Object state)
                                throws IOException
        Overrides:
        readField in class org.apache.avro.generic.GenericDatumReader<org.apache.avro.generic.GenericRecord>
        Throws:
        IOException
      • readWithoutConversion

        protected final Object readWithoutConversion​(Object old,
                                                     org.apache.avro.Schema expected,
                                                     org.apache.avro.io.ResolvingDecoder in)
                                              throws IOException
        Overrides:
        readWithoutConversion in class org.apache.avro.generic.GenericDatumReader<org.apache.avro.generic.GenericRecord>
        Throws:
        IOException
      • readArray

        protected final Object readArray​(Object old,
                                         org.apache.avro.Schema expected,
                                         org.apache.avro.io.ResolvingDecoder in)
                                  throws IOException
        Overrides:
        readArray in class org.apache.avro.generic.GenericDatumReader<org.apache.avro.generic.GenericRecord>
        Throws:
        IOException
      • doReadWithoutConversion

        protected Object doReadWithoutConversion​(Object old,
                                                 org.apache.avro.Schema expected,
                                                 org.apache.avro.io.ResolvingDecoder in)
                                          throws IOException
        Throws:
        IOException
      • doReadArray

        protected Object doReadArray​(Object old,
                                     org.apache.avro.Schema expected,
                                     org.apache.avro.io.ResolvingDecoder in)
                              throws IOException
        Throws:
        IOException