The CSV file format in Effort

This post describes the file format that is compatible with the CsvDataLoader component of the Effort library.

The component accepts files that follow the traditional CSV format:

  • The first row contains the header names
  • Comma ( , ) is the separator character
  • Double quote ( ” ) is the delimiter character
  • Double double quote ( “” ) is used to express a single double quote between delimiters

There are some additional requirements that need to be taken into consideration.

  • Numbers and dates are parsed with invariant culture setting
  • Binaries are encoded in base64 format
  • Null values are represented with empty fields without delimiters
  • Empty strings are represented with empty fields with delimiter
  • Backslash serves as escape character for backslash and newline characters

These are all the rules that are need to be followed. The next example demonstrates the rules by representing a compatible CSV file.

id,name,birthdate,reportto,storages,photo
"JD","John Doe",01/23/1982,"MHS","\\\\server1\\share8\r\n\\\\server2\share3",
"MHS","Michael ""h4x0r"" Smith",05/12/1975,,"","ZzVlKyszZjQ5M2YzNA=="

The content of each database table is represented by a dedicated csv file that has to be named as {table name}.csv.

Advertisements

Data loaders in Effort

Data loaders are useful components in the Effort library that were designed to help setting up the initial state of a fake database.

Adding records to the tables by using the Entity Framework API can be inflexible and the written code might become hard to maintain. Furthermore these type of insert operations flow through the entire EF and Effort pipeline. This might have great performance impact. Data loaders solves these problem by allowing to insert data from any custom source during the initialization with extra small overload.

They are really easy to use, all the developer has to do is to create a data loader instance and pass it to the chosen Effort factory method. For example:

var dataLoader = new EntityDataLoader("name=MyEntities");

var connection = DbConnectionFactory.CreateTransient(dataLoader);

Effort provides multiple built-in data loaders:

  • EntityDataLoader
  • CsvDataLoader
  • CacheDataLoader

EntityDataLoader is able to fetch data from an existing database by utilizing an existing Entity Framework compatible ADO.NET provider. It is initialized with an entity connection string.

var dataLoader = new EntityDataLoader("name=MyEntities");

The purpose of CsvDataLoader is to read data records from CSV files. It is initialized with a path that points to a folder containing the CSV files. Each file represents the content of a database table.

var dataLoader = new CsvDataLoader(@"C:\path\to\files");

The exact format of these CSV files are documented in a separate post. There is also a little tool that helps the developers to export the data from an existing database into appropriately formatted CSV files.

The CachingDataLoader was designed to speed up the initialization process by wrapping any kind of data loader with a cache layer. If the wrapped data loader is specified with a specific configuration the first time, the CachingDataLoader will pull the required data from the wrapped data loader. As a side effect, this data is going to be cached in the memory. If the CachingDataLoader was initialized to wrap the same kind of data loader with the same configuration again, then the data will be retrieved from the previously create cache, the wrapped data loader will not be utilized.

var wrappedDataLoader = new CsvDataLoader(@"C:\path\to\files");

var dataLoader = new CachingDataLoader(wrappedDataLoader, false);

Each data loader can be used in different scenarios. I suggest to use EntityDataLoader during interactive testing, while CachingDataLoader and CsvDataLoader combined can be really useful if they are utilized in automated tests.

Using Effort in complex applications

In a previous post some of the basic features of the Effort library has been introduced with very simple code samples. You might have asked: okay, but how should I use this tool in a complex application? This post tries to give an answer to this question.

Applications have to be designed properly in order to make them easily testable. A traditional technique called decoupling is widely used to achieve testability. It means that your system has to be built up from individual pieces (usually referred as components) that can be easily disassembled and reassembled. A previous post demonstrates a technique that makes possible to decouple Entity Framework based data driven applications. Do not continue without reading and understanding it! The solution presented here relies on the architecture described there.

In a complex application more instances of the same type of ObjectContext might be used during the serving of a single request, these ObjectContext instances usually have to work on the same database. The ObjectContextFactory class of the Effort library is only capable of creating individual ObjectContext instances without any relation between them, so each created ObjectContext instance works with a completely unique fake database. If you wanted to create ObjectContext instances that work on the exact same database, you should create a fake EntityConnection object and then pass it to all ObjectContext instances. The Effort library provides the EntityConnectionFactory class that was created for exactly this purpose. Its factory methods accepts an entity connection string as argument and retrieves an EntityConnection object that will communicate with an in-process in-memory fake database.

But how should this factory class be used in the architecture that was presented in the mentioned blog post? There are two kind of possible injection points in that system: slots with IObjectContextFactory or IConnectionProvider interfaces. The latter is just ideal for this scenario, because a single connection provider component is shared among the different data components, so they can use the exact same connection object. Before creating the fake component, lets take a look on the implementation of the original production component.

public class DefaultConnectionProvider : ConnectionProviderBase
{
    protected override EntityConnection CreateConnection()
    {
        return new EntityConnection("name=ClubEntities");
    }
}

The ConnectionProviderBase class made the implementation really self evident. This base class can be used to create the new fake connection provider component too. Simply use the mentioned factory class of the Effort library to instantiate the fake EntityConnection instance.

public class FakeConnectionProvider : ConnectionProviderBase
{
    protected override EntityConnection CreateConnection()
    {
        return Effort.EntityConnectionFactory.CreateTransient("name=ClubEntities");
    }
}

That’s it! Now just use the FakeConnectionProvider class instead of the DefaultConnectionProvider class (in the same way) and the data operations initiated by the data components will be executed on a single in-memory fake database. In this way automated tests can be created without the dependence on the external database.

As shown, using the Effort library in complex data driven applications could require precise architectural designing. However, in a properly built up system it can be integrated without too much effort.

Introducing Effort

So what is Effort? It stands for Entity Framework Fake ObjectContext Realization Tool. Basically this is exactly what it is meant to do. Creating automated tests for data driven applications has never been a trivial task. This is also true for Entity Framework based applications, implementing proper fake ObjectContext or DbContext class requires great effort. Oh, sure… 🙂

This library approaches this problem from a very different way, it tries to emulate the underlying resource-heavy external database with a lightweight in-process in-memory database. This makes possible to run your tests rapidly without the presence of an external database. At the end of this blog post, you will see exactly how.

Effort can be downloaded from Codeplex or installed with NuGet. It is really convenient to use, practically you don’t have to do any modification to your existing ObjectContext or DbContext classes. This is presented by the following example. Let’s assume that we have an ObjectContext class called NorthwindEntities.

using(NorthwindEntities context = new NorthwindEntities())
{
    return context.Categories.ToList();
}

This code returns all the categories stored in the database. A simple modification is enough to make Entity Framework use a fake in-memory database instead:

using(NorthwindEntities context = 
    Effort.ObjectContextFactory.CreateTransient<NorthwindEntities>())
{
    return context.Categories.ToList();
}

The term “transient” refers to the lifecycle of the underlying in-memory database. The owner ObjectContext (technically the DbConnection) will be using a completely unique database instance. If the context/connection is disposed, than the database will be disposed too. If you run this code, an empty collection will be returned. This is self-evident, since the fake database was completely empty.

You could set the initial state of the database with Entity Framework, but Effort provides data loaders to do this more easily. The following example fetches the initial data from a real database:

IDataLoader loader = new EntityDataLoader("name=NorthwindEntities");

using(NorthwindEntities context = 
    ObjectContextFactory.CreateTransient<NorthwindEntities>(loader))
{
    return context.Categories.ToList();
}

This and the first code will return exactly the same collection of entities, but there is a very big difference: you can do anything you want with this data context, the result will be the same all the time. The following example proves this:

IDataLoader loader = new EntityDataLoader("name=NorthwindEntities");

using(NorthwindEntities context = 
    ObjectContextFactory.CreateTransient<NorthwindEntities>(loader))
{
    foreach (Category cat in context.Categories)
    {
        context.Categories.DeleteObject(cat);
    }
    context.SaveChanges();
}

using(NorthwindEntities context = 
    ObjectContextFactory.CreateTransient<NorthwindEntities>(loader))
{
    return context.Categories.ToList();
}

The first part of this code deletes all the categories from the database. No matter, the second part of the code will return the exact same collection like before. If you run the code again, the first part has to delete the entities again, the object set will never be empty.

Furthermore, you can completely dismiss the need of the external database. Export your data tables into local CSV files (Effort provides a tool to do it easily) and use the CSV data loader.

IDataLoader loader = new CsvDataLoader("C:\PathOfTheCsvFiles");

using(NorthwindEntities context = 
    ObjectContextFactory.CreateTransient<NorthwindEntities>(loader))
{
    return context.Categories.ToList();
}

If you run this code, there will be zero communication with the external database, but it just behaves exactly like it had.

As you can see, Effort makes possible to create automated tests for Entity Framework applications in a very convenient and powerful way. The tests can run without the presence of any external database engine. Each test can work on a completely unique database instance, so their actions will be completely isolated, they can even run concurrently. The state of database they are working on can be easily initialized too.

Future posts will reveal the capabilities of Effort for different scenarios.

Designing loosely coupled Entity Framework based applications

The main purpose of designing loosely coupled applications is to reduce the dependency between the components of the system. This provides more flexibility, maintainability, modularity that are very important especially in long-term projects.

Dependency inversion

To achieve this it is highly recommended to follow the dependency inversion principle (one of
the SOLID principles):

  • High level modules should not depend upon low level modules. Both should depend on abstractions.
  • Abstractions should not depend upon details. Details should depend upon abstractions.

image

In practice this means that a class should not depend directly on another class (detail) but on an interface (abstraction). The TopComponent has a member (socket) that expects any component that implements the ILowComponent interface. In this way, the TopComponent does not depend on the LowComponent. The ILowComponent is defined by the top component, so actually the low component depends on the top component (inversed dependency). This programming technique is usually referred as component-based software engineering. In this blog post I will introduce a loosely coupled architecture for Entity Framework based applications that uses this method.

The demo application

The demo application manages the content of a database with a very simple scheme: a single table with a few data fields and an auto-generated primary key. I created a new Entity Framework model and imported the database schema.

image

Components

The main part of the demo application is a simple data component, that manages the content of the database. It can be described with the following interface:

public interface IDataService
{
    Person GetPerson(int id);

    void StorePerson(Person person);

    void DeletePerson(Person person);
}

How should be the GetPerson method implemented in a RAD project with the use of Entity Framework. This is a possible solution:

public Person GetPerson(int id)
{
    if (id < 0)
    {  
         return new Person();
    }

    using (ClubEntities context = new ClubEntities())
    {
        return context.People.Single(p => p.Id == id);
    }
}

There are multiple problems with this implementation:

  • Smaller problem: the component completely depends on Entity Framework.
  • Bigger problem: the initialization of the ClubEntities object context class is baked into the implementation of the component, so there will be no easy way to alter it later (imagine how many times the new ClubEntities() expression could be written is a large application).

The former can be solved by disabling the code generation of the EDM and implementing the entity classes in a plain old way and the object context class with component-based approach. This would need much work if there were many tables in the database, so instead I will apply this POCO T4 template to alter the code generation. I also modified it a little bit to make it generate a interface for the object context class. So I got this generated code:

public partial class Person
{
    public virtual int Id { get; set; }
    
    public virtual string Name { get; set; }
    
    public virtual string Title { get; set; }
}

public partial interface IClubEntities : IDisposable
{
    IObjectSet<Person> People { get; }
    
    ObjectStateManager ObjectStateManager { get; }
    
    int SaveChanges();
}

public partial class ClubEntities : ObjectContext, IClubEntities
{
   // Self-evident implementation
}

It surely could have been done in an even more ORM independent way, but this will be enough now.

The latter problem can be solved by decoupling the functionality: the data component should not instantiate the object context class, this role should belong to a separate component. Let this component be a factory component whose interface can be defined like this:

public interface IObjectContextFactory
{
    IClubEntities Create();
}

This is everything we need to be able to rewrite the implementation of the data service component.

public class DataService : IDataService
{
    private IObjectContextFactory contextFactory;

    public DataService(IObjectContextFactory contextFactory)
    {
        this.contextFactory = contextFactory;
    }

    public Person GetPerson(int id)
    {
        if (id < 0)
        {  
            return new Person();
        }

        using (ClubEntities context = this.contextFactory.Create())
        {
            return context.People.Single(p => p.Id == id);
        }
    }
}

The object context factory component is a member of the data service component and can be set through its constructor. The query method instantiates the object context with the use of the factory component. Notice that we were able to implement the data component without implementing the factory component. This is one of the main advantages of component-based software engineering.

The next task is to implement the factory component. The ObjectContext – that is the base of ClubEntities – class provides multiple constructors. One of them expects no arguments, it uses the default, baked-in connection string to initialize itself. The two others expect an EntityConnection object or a connection string. The former will be used this time. But how should the EntityConnection object be created? Let’s create another component that serves as provider for this kind of object. Its interface is defined like this:

public interface IConnectionProvider : IDisposable
{
    EntityConnection Connection { get; }
}

Note that this is a disposable component, because it also handles the lifetime of the provided connection object. If the provider component is disposed, than the connection object will be disposed too. The reason behind this will be explained later. For now, we can implement the IObjectContextFactory interface, it is really self-evident :

public class ObjectContextFactory : IObjectContextFactory
{
    private IConnectionProvider provider; 

    public ObjectContextFactory(IConnectionProvider provider)
    {
        this.provider = provider;
    }

    public IClubEntities Create()
    {
        return new ClubEntities(this.provider.Connection);
    }
}

So what is the motivation behind the connection provider component? If an ObjectContext class is instantiated with the empty or connection string constructor and it is disposed then the internal connection object will be disposed too. If the ObjectContext class is instantiated with an external connection object, it will not be disposed if the ObjectContext instance is disposed. The following sequence diagrams might help to understand the different mechanisms.

image

image

This means that the externally created EntityConnection instance should be manually disposed or leave it for the GC finalization (what is not recommended). You might ask why can’t we use the other constructor. My main reason that I want to ensure that the ObjectContext instances use the same EntityConnection instance. This has more reasons:

  • TransactionScope: if two connection objects are used in the same transaction scope, than the operations are might executed in a distributed transaction what needs DTC (despite one connection object would be enough).
  • Automated testing: this is explained in an other blog post

Since the purpose of the connection provider component is explained, a base implementation can be created: 

public abstract class ConnectionProviderBase : IConnectionProvider
{
    private EntityConnection connection;

    public EntityConnection Connection
    {
        get 
        {
            if (this.connection == null)
            {
                this.connection = this.CreateConnection();
            }

            return this.connection;
        }
    }

    protected abstract EntityConnection CreateConnection();

    public void Dispose()
    {
        if (this.connection != null)
        {
            this.connection.Dispose();
        }
    }
}

There is nothing complicated here. The component does lazy initialization: the connection object is only created when it is requested. If the component is disposed then the connection object will be disposed too (if it exists). It is really easy to create a concrete complement based on this implementation:

public class DefaultConnectionProvider : ConnectionProviderBase
{
    protected override EntityConnection CreateConnection()
    {
        return new EntityConnection("name=ClubEntities");
    }
}

This is quite self-evident, at this point there is no problem with baking the connection configuration in, it can be easily replaced by implementing another component derived from the same base class.

Assembling the components

All the required components are ready, there is only one thing to do: assemble them. The following component diagram shows the simplest configuration:

image

The façade component serves as a top-level component that is aware of the lifetime of the request currently served by the application. It holds a reference to the connection provider component that is used by all the object context factory components. In the appropriate time (typically in the end of the request), it disposes the connection provider component, that result in the disposal of the connection object. A simple implementation of the façade component could look like this:

using (IConnectionProvider provider = new DefaultConnectionProvider())
{
    IDataService dataService = new DataService(new ObjectContextFactory(provider));
    
    Person person = dataService.GetPerson(id);
    person.Title = "Software engineer";

    dataService.StorePerson(person);
}

It is easy to image how a complex application would look like. The following diagram is similar to previous, but it uses three types of data service to handle requests.

image

Note that all the object context factory and the façade service component uses the same connection provider component instance. This architecture ensures that the data service components will use the same connection object that also will be disposed in the appropriate time.

Conclusion

Designing loosely coupled applications is not self-evident. Finding the proper way of functionality decoupling requires deep understanding of the environment and planned features of the designed application.