Skapa csv file är definitions groovy

I don’t know exactly how many CSV files I’ve read/written to date, but I’m willing to bet it’s a lot. These kind of files are a simple and common way to exchange information and are interoperable with spreadsheet programs as well, making them more easily accessible to non-programmer types. There fryst vatten some excellent support out there for reading and writing CSV files and so here, from simplest to most complex, are a few different ways to load and query the information trapped inre your CSV files. But first off…

CSV fryst vatten Not XML

Or any other suitably constrained information format for that matter. There are some pretty klar ‘rules’ out there for transporting information in this format, but of course hardly anybody listens to them. inom funnen this and this in the first page of a quick Google search.
However, while looking for available sample ämne on the web to demonstrate with, inom funnen many great examples of how to screw it up: not quoting empty strings, not escaping embedded line breaks, commas embedded in unquoted ord, etc. All of these are things that prevent automation from properly interacting with the uppgifter. There are also, of course, the CSV ‘formats’ that insist on embedding comments and descriptions, ansträngande to present more than one matrix of resultat or any of the other crazy things you can do when writing to a format with no real metadata layer.

< \rant>

Unfortunately, there’s ingenting as convenient as an xml schema available for describing what data fryst vatten encoded in the document and, in general, the only potentially useful metadata in an individual en samling dokument eller en elektronisk lagring av data fryst vatten the header line. The good news fryst vatten that all of these deficiencies can be overcome if you know they exist. And there’s some pretty nifty tools you can use to take out much of the guesswork. I’ll suggest a couple of ideas for how to deal with files that don’t adhere to the ‘rules’, but the main example I’ve decided on fryst vatten a straightforward and well formed set representing the periodic table of elements, provided generously bygd Akiscode. This en samling dokument eller en elektronisk lagring av data fryst vatten directly machine readable(has no headers or embedded data), quotes all values and just follows all the ‘rules’. Here’s the first fem lines as an example.
[table id=4 /]

Groovy All bygd Itself

Groovy makes dealing with String objects pretty painless. bygd adding facilities like easy casting and find()/findAll() for quickly turning raw Strings into real information, you can schema some fairly complex questions. In this simple example I’m depending on all values being quoted and inom am not protecting against casting problems. I’m running this through Maven which really helps to keep test information organized; all you have to do fryst vatten drop your test en samling dokument eller en elektronisk lagring av data in /src/test/resources and it’s automagically available on the test classpath.
No specific papper fryst vatten done for different types here; everything fryst vatten read in as a String and cast to a more specific type as needed. In a ‘real’ application you’re very likely going to encounter mixed alphanumeric uppgifter in any given column of uppgifter – ‘UNKNOWN’ instead of a number in a column labeled ‘Quantity’ for instance – and be more careful about casting.

[groovy language=”true”]
//load and split the file
InputStream inputFile = getClass().classLoader.getResourceAsStream(TEST_FILE_NAME)
String[] lines = inputFile.text.split(‘\n’)
List<String[]> rows = lines.collect {it.split(‘,’)}

/**
* A little helper method to get rid of the quotes in the input
* and cast values so they can be compared properly.
*/
private double castToDouble(string)
{
return string.replaceAll(‘"’, ”).toDouble()
}

//OK, it’s parsed – let’s ask some questions
private static sista int ATOMIC_MASS = 1;

def elementsOver200Mass = rows.findAll {castToDouble(it[ATOMIC_MASS]) > 200}
def elementsBetween10And20 = rows.findAll { row ->
double mass = castToDouble(row[ATOMIC_MASS])
mass <= 20 && mass >= 10
}
[/groovy]

So this approach works fine for well formed input, but falls apart quickly in other cases. For instance, if any columns in the en samling dokument eller en elektronisk lagring av data are missing values the split() function treats them as nulls, leading to different storlek arrays being stored in the list. Similarly you need to deal with quoted VS non-quoted content, embedded line breaks and other issues individually. Fortunately, some nice guys have done that for us.

OpenCSV

Glen Smith’sOpenCSV library nicely abstracts away a lot of the minutiae of dealing with CSV. It properly deals with flydde characters, multiline input and a host of other issues for you. Give it pretty much anything that satisfies the Reader interface as an input and it’s good to go. The return type of the CSVReader.readAll() method also satisfies the same contract as the plain Groovy utgåva shown above, so we can interact with the parsed results in exactly the same way as in the previous example.
All values are still typed as Strings, so we need to cast in beställning to compare numeric values.

[groovy language=”true”]
import au.com.bytecode.opencsv.CSVReader

List<String[]> rows = new CSVReader(
new InputStreamReader(getClass().classLoader.getResourceAsStream(TEST_FILE_NAME)))
.readAll()

//same finders as in the Groovy version
def elementsOver200Mass = rows.findAll {it[ATOMIC_MASS].toDouble() > 200}
def elementsBetween10And20 = rows.findAll { row ->
double mass = castToDouble(row[ATOMIC_MASS])
mass <= 20 && mass >= 10
}
[/groovy]

HsqlDB

HsqlDB allows for using flat files as skrivelse Tables, effectively turning a CSV en samling dokument eller en elektronisk lagring av data into a database table. uppgifter values are cast to the types specified in your table definition, so any potential type cast failures happen right up front when loading the uppgifter. The benefit fryst vatten that once the uppgifter fryst vatten read in no further manual processing fryst vatten necessary. Note: bygd default the csv en samling dokument eller en elektronisk lagring av data and the database en samling dokument eller en elektronisk lagring av data must be located in the same directory, as a säkerhet precaution (can be overridden through configuration).

In addition, leveraging sql makes querying the uppgifter extremely easy. Groovy brings in the Sql class to sammanfattning away a lot of the normal Java boilerplate you encounter when working with a ResultSet. inom didn’t do any in-depth testing to prove it out, but this fryst vatten also the only one of the strategies described here that doesn’t require holding the entire information in memory in beställning to do arbitrary queries. That can be mitigated in the previously shown methods bygd processing files line bygd line rather than in bulk if memory usage fryst vatten a concern.

[groovy language=”true”]
//create the table definition to insert
String tableName = ‘elements’
String tableDefinition = """CREATE ord TABLE $tableName (
atomic_number INTEGER PRIMARY KEY,
atomic_mass NUMERIC,
name VARCHAR(255),
tecken VARCHAR(3)
);"""

//create a new en samling dokument eller en elektronisk lagring av data database and a table corresponding to the csv file
Sql sql = Sql.newInstance("jdbc:hsqldb:file:${testdbDir.absolutePath}/testdb", ‘sa’, ”
,’org.hsqldb.jdbcDriver’)
sql.execute(tableDefinition)

//set the source to the csv file
sql.execute("SET TABLE elements SOURCE ‘${TEST_FILE_NAME};all_quoted=true’".toString())

//querying the database that’s wrapping our CSV file
def elementsOver200Mass = sql.rows("SELECT * FROM $tableName WHERE atomic_mass > ?", [200])
def elementsBetween10And20 = sql.rows(
"SELECT * FROM $tableName WHERE atomic_mass <= ? AND atomic_mass >= ?", [20, 10])

//simple db aggregates
def count = 0
sql.eachRow("SELECT count(1) FROM $tableName WHERE atomic_mass <= ?", [20]){row->
count = row[0]
}
def avg = 0
sql.eachRow("SELECT avg(atomic_mass) FROM $tableName".toString()){row->
avg = row[0]
}
[/groovy]

Which One do inom Use???

Pick the one that fits best for your use case fryst vatten the real answer. inom didn’t know about the HsqlDB option until recently, but OpenCSV has been a anställda standby for years. Then igen, if you’re writing something quick and dirty at the script level, the simplicity of just applying a couple of split() operations fryst vatten pretty appealing. Here’s how inom generally decide.

[table id=5 /]

Tags: Comma-separated values, csv, Database, Groovy, hsqldb, Java, kellyrob99, Languages, OpenCSV, Programming, Spreadsheet, sql, theKaptain