View previous topic :: View next topic |
Author |
Message |
jpelcis
Joined: 09 Mar 2006 Posts: 40
|
Posted: Sat Jul 01, 2006 3:15 pm Post subject: Call for Discussion: dbi.Row |
|
|
The quote below comes directly from the SQL:2003 Specification, 5th Working Draft. To see what led to this, go to http://www.dsource.org/forums/viewtopic.php?t=1637.
Chapter 2, Page 11 wrote: |
SQL defines predefined data types named by the following <key word>s: CHARACTER, CHARACTER VARYING, CHARACTER LARGE OBJECT, BINARY LARGE OBJECT, NUMERIC, DECIMAL, SMALLINT, INTEGER, BIGINT, FLOAT, REAL, DOUBLE PRECISION, BOOLEAN, DATE, TIME, TIMESTAMP, and INTERVAL.
|
There's also XML and other types in previous specifications, but these are the critical ones. Most of these can be converted to either a D basic type or something in the standard library.
Code: |
CHARACTER
dchar
CHARACTER VARYING
char[]
CHARACTER LARGE OBJECT
char[]
BINARY LARGE OBJECT
void[]
NUMERIC
char[] (no equivalent)
DECIMAL
char[] (no equivalent)
SMALLINT
short
INTEGER
int
BIGINT
long
FLOAT
float
REAL
double
DOUBLE PRECISION
double
BOOLEAN
enum {false, true, unknown} (bool doesn't have unknown)
DATE
std.date.Date
TIME
std.date.Date
TIMESTAMP
std.date.Date
INTERVAL
std.date.Date
|
Luckily, many of these end up as the same D type. Row can be changed to:
Code: |
enum Boolean {
False = false,
True = true,
Unknown
}
final class Row {
this (char[][] names, char[][] stringValues, ...);
alias getString opIndex; // Note: size_t instead of int.
deprecated alias getString get; // Close enough.
deprecated int getFieldType (int index); // Need a standardized type.
deprecated char[] getFieldDecl (int index); // Most DBDs can't get this info.
deprecated void addField (char[] name, char[] value, char[] decl, int type); // See above two.
void addField (char[] name, char[] stringValue, char[] value, TypeInfo type = typeid(char[]));
char[] getFieldName (size_t index);
size_t getFieldIndex (char[] name); // Note slight change.
TypeInfo getFieldType (size_t index);
TypeInfo getFieldType (char[] name);
dchar getChar (size_t index);
dchar getChar (char[] name);
char[] getString (size_t index);
char[] getString (char[] name);
void[] getObject (size_t index);
void[] getObject (char[] name);
short getShort (size_t index);
short getShort (char[] name);
int getInt (size_t index);
int getInt (char[] name);
long getLong (size_t index);
long getLong (char[] name);
float getFloat (size_t index);
float getFloat (char[] name);
double getDouble (size_t index);
double getDouble (char[] name);
Boolean getBoolean (size_t index);
Boolean getBoolean (char[] name);
std.date.Date getDate (size_t index);
std.date.Date getDate (char[] name);
private:
char[][] fieldNames;
char[][] stringValues;
TypeInfo[] types;
long longValue;
double doubleValue; // Not real for consistency.
}
|
There are several things that could be done to improve this, but I wanted to get someone elses opinion first.
My primary reservation is how big Row would be. One possible solution is to get rid of the getChar, getShort, getInt, and getFloat functions. The values could be cast up to the bigger type and remain that way for the function return.
Another is the use of TypeInfo, which could be solved in the case of addField by the use of a template, but what about getType?
Also, it is debatable whether the conversion should be done in Row or in Result. My opinion is to do it in Row to whatever is specified when the value is added. The problem is what to do with, for example, PostgreSQL, which doesnt seem to have a method to get the type of something. Should the use of a string be forced, should the user be allowed to attempt to get the value in any format (DBIException or ConvException thrown on failure), or should dbi.pg.PgResult attempt to decipher the type of the string?
Anyway, I just thought I would throw this out before I started coding. Whatever is decided on here will be in 0.3.0, with anything deprecated removed in 0.4.0.
Last edited by jpelcis on Thu Jul 13, 2006 10:11 am; edited 3 times in total |
|
Back to top |
|
|
brad Site Admin
Joined: 22 Feb 2004 Posts: 490 Location: Atlanta, GA USA
|
Posted: Sat Jul 01, 2006 3:53 pm Post subject: |
|
|
This plan looks fine to me. I just wanted feedback on doing more than just get() as was the case in 0.2.0.
One question, are you thinking ahead on platforms available to D with size_t?
BA |
|
Back to top |
|
|
jpelcis
Joined: 09 Mar 2006 Posts: 40
|
Posted: Sat Jul 01, 2006 4:07 pm Post subject: |
|
|
brad wrote: |
This plan looks fine to me. I just wanted feedback on doing more than just get() as was the case in 0.2.0.
|
Cool. That's good to hear.
brad wrote: |
One question, are you thinking ahead on platforms available to D with size_t?
|
That's mostly it, although I believe it can also be more efficient to use size_t on x86 platforms since it will match the cpu calculations and registers, although I might be wrong. |
|
Back to top |
|
|
brad Site Admin
Joined: 22 Feb 2004 Posts: 490 Location: Atlanta, GA USA
|
Posted: Sat Jul 01, 2006 4:38 pm Post subject: |
|
|
Sounds good.
I wouldn't worry about size of the class and would leave those methods in for now, and not cast to the larger types. We'll see what a profiler says about this at a much later date.
Along those lines, I was looking at getFieldNames()
http://www.dsource.org/projects/ddbi/browser/trunk/dbi/row.d#L83
and in talking to larsivi, we decided to leave it as is. I was considering keeping a separate associative array with fieldnames as keys and the field indices in the other array (fieldValues, I think) as values. We would incur more storage, but maybe have faster index lookups than this method. But, because it's only a problem on huge rows, and given the youth of this lib, he cautioned me to not prematurely optimize...
BA |
|
Back to top |
|
|
kris
Joined: 27 Mar 2004 Posts: 1494 Location: South Pacific
|
Posted: Wed Jul 12, 2006 10:58 pm Post subject: |
|
|
brad wrote: | I wouldn't worry about size of the class and would leave those methods in for now, and not cast to the larger types. We'll see what a profiler says about this at a much later date. |
Hrm ~ I'd be careful about that, since removing functionality is considerably harder than adding it (deprecation can be tough for some users).
The getShort() and getInt() could easily be bound together (assuming meta info is present in the row description, identifying the original type), and there's no performance penalty in doing so since a return value in D is 32bit minimum anyway ~ the language is not targeted for 16bit devices. The getFloat() should perhaps be combined with getDouble(), if only because nobody really uses float anymore? Maybe not.
The getChar() is a bit different, since it may be advantageous to have it return dchar instead? Utf support can become a tad messy, and array types are often treated quite differently that single instances. To illustrate, while it might be perfectly acceptable to support only utf8 strings, you can't stuff a single utf8 sequence into a char ... has to be dchar instead (dependent upon what the underlying DB actually exposes).
Thus, you might have:
Code: |
dchar getChar();
char[] getString(); |
It's a shame these D types were not called utf8, utf16 and utf32 instead of char, wchar and dchar
On the other hand, if you were to support returning arrays in general, then type-folding becomes a royal pain for the user. But since you're using the Row approach, this latter aspect really shouldn't become an issue. |
|
Back to top |
|
|
jpelcis
Joined: 09 Mar 2006 Posts: 40
|
Posted: Wed Jul 12, 2006 11:20 pm Post subject: |
|
|
kris wrote: | brad wrote: | I wouldn't worry about size of the class and would leave those methods in for now, and not cast to the larger types. We'll see what a profiler says about this at a much later date. |
Hrm ~ I'd be careful about that, since removing functionality is considerably harder than adding it (deprecation can be tough for some users).
|
True. That's a large part of why I am announcing this so far ahead of time (the other, of course, is for opinions).
kris wrote: |
The getShort() and getInt() could easily be bound together (assuming meta info is present in the row description, identifying the original type), and there's no performance penalty in doing so since a return value in D is 32bit minimum anyway ~ the language is not targeted for 16bit devices. |
Unfortunately, that's where the trouble comes. PostgreSQL doesn't give any of that information to the application. The question is whether the user should be allowed to call the functions (and attempt conversion), whether D DBI should guess, or whether functionality should be limited to string only as it is now.
I'm not sure exactly what you mean by bound together, but my guess is that you are saying there should only be a getInt(), with a getShort defined as cast(short)getInt(). Is that correct?
kris wrote: | The getFloat() should perhaps be combined with getDouble(), if only because nobody really uses float anymore? Maybe not. |
Float is still used for some purposes (in fact, video cards often use "half" instead of "double"). Float support could be done as described above for the integers.
kris wrote: |
The getChar() is a bit different, since it may be advantageous to have it return dchar instead? Utf support can become a tad messy, and array types are often treated quite differently that single instances. To illustrate, while it might be perfectly acceptable to support only utf8 strings, you can't stuff a single utf8 sequence into a char ... has to be dchar instead. |
Good catch. I've changed the proposal to return dchar instead of char. Would it be worth it to return dchar[] while we're at it, too (assuming we get implicit casting between character types)? |
|
Back to top |
|
|
kris
Joined: 27 Mar 2004 Posts: 1494 Location: South Pacific
|
Posted: Thu Jul 13, 2006 12:50 am Post subject: |
|
|
I was suggesting getShort() and getFloat() be removed
Regarding dchar[], that's a tricky one. It's very much dependent on the application using the data. For example, a word-processor might use utf32[], whereas the actual text-names managed by a file-system might use utf8[] instead.
There's a considerable (relative) cost related to utf conversions. In many scenarios it just doesn't matter. For a DB front-end, it perhaps does? If you can support all three on every DB (no conversion required) then that may be an option. Otherwise, I'd be very tempted to stick with utf8[] only.
Why? It's difficult to find a yardstick everyone will agree with, but experience with Mango has indicated the "lower" the level of functionality, the more likely it is to use utf8[]. Another way to express that is, utf8[] is generic in nature, whereas utf32[] tends to be quite application-specific.
Poor old utf16[] is in a bit of a no-mans land: in comparison to the other two, it offers something very useful for a truly small segment of applications. That is, utf16 doesn't offer the attractive direct-indexing ability that utf32 has, and typically takes more space than utf8. For BMP-chars-only utf16 can be a good tradeoff for specific devices, such as "credit-card" language translators. |
|
Back to top |
|
|
kris
Joined: 27 Mar 2004 Posts: 1494 Location: South Pacific
|
Posted: Thu Jul 13, 2006 1:03 am Post subject: |
|
|
jpelcis wrote: | Unfortunately, that's where the trouble comes. PostgreSQL doesn't give any of that information to the application. The question is whether the user should be allowed to call the functions (and attempt conversion), whether D DBI should guess, or whether functionality should be limited to string only as it is now. |
That sucks
One solution might be to extract the DB schema out into some form that you can utilize at runtime? This would provide the missing meta data, but seems like a lot of effort.
String-only is cool for many applications, but others will pay a major performance price when indexing, for example.
Another solution might be to make part of the DB interface pluggable. Postgres gets string only, whereas others get full support? It's not ideal, but better than limiting to string only? A pluggable approach might be used to handle the meta-data noted above? |
|
Back to top |
|
|
pragma
Joined: 28 May 2004 Posts: 607 Location: Washington, DC
|
Posted: Thu Jul 13, 2006 9:55 am Post subject: |
|
|
jpelcis wrote: | kris wrote: |
The getShort() and getInt() could easily be bound together (assuming meta info is present in the row description, identifying the original type), and there's no performance penalty in doing so since a return value in D is 32bit minimum anyway ~ the language is not targeted for 16bit devices. |
Unfortunately, that's where the trouble comes. PostgreSQL doesn't give any of that information to the application. The question is whether the user should be allowed to call the functions (and attempt conversion), whether D DBI should guess, or whether functionality should be limited to string only as it is now. |
As kris said: that sucks.
Usually when you're talking to a databse, the schema is known. The typical exception to this is when you're working with a database suite like TOAD, in which case, it understands that it needs to dig deep in order to get metadata. IMO, one could write a rather nice extension (a layer on top rather than as a plugin, as Kris mentioned) to DBBI that handles vendor-specific metadata for other tasks... but that's clearly beyond the present scope of DBBI as far as I know.
So why not err on the side of trusting the programmer, just as D itself does? Just allow the various row accessors (getInt, getString, etc.) to convert directly from binary data present in the recordset. I gather that there may be some promotion or other *small* conversion steps needed for some types (vendor-specific floating-point types, zero-terminated strings, etc), but mostly, it could be just static casts from binary data.
This can also help the string debacle: IMO UTF transcoding shouldn't be the responsibility of a database access layer. Just allow getString()/getUtf8(), getUtf16(), and get getUtf32() to cast the raw data over to the type at hand and return it to the user. To cover all the other cases, you could even provide a getRaw() that returns a void[] - just to play it safe. _________________ -- !Eric.t.Anderton at gmail |
|
Back to top |
|
|
jpelcis
Joined: 09 Mar 2006 Posts: 40
|
Posted: Thu Jul 13, 2006 9:58 am Post subject: |
|
|
kris wrote: | I was suggesting getShort() and getFloat() be removed. |
With that simple function and an explicit cast, it could make the user code have less warnings. If the function body for each is one line, there's not really much of a reason NOT to throw it in.
kris wrote: |
There's a considerable (relative) cost related to utf conversions. In many scenarios it just doesn't matter. For a DB front-end, it perhaps does? If you can support all three on every DB (no conversion required) then that may be an option. Otherwise, I'd be very tempted to stick with utf8[] only. |
And that's where the fun comes. With some DBs, you can have your strings stored as UTF-16. If you do that, you'll do the conversion to avoid the conversion . That said, there could be a version to get utf-8, utf-16, and utf-32. Whether a conversion is required or not would depend on what the server returns.
kris wrote: | One solution might be to extract the DB schema out into some form that you can utilize at runtime? This would provide the missing meta data, but seems like a lot of effort. |
I'm not sure its worth the time to do that. That said, if someone else does it, I'll put it in.
kris wrote: | String-only is cool for many applications, but others will pay a major performance price when indexing, for example.
Another solution might be to make part of the DB interface pluggable. Postgres gets string only, whereas others get full support? It's not ideal, but better than limiting to string only? A pluggable approach might be used to handle the meta-data noted above? |
I meant do string only for PostgreSQL, since that's the only thing the application can know works for sure. Personally, I'm still favoring allowing the application to request, for example, an int, but have it throw an exception if that doesn't work.
Or we could deprecate the entire PostgreSQL library and have its connections go through the ODBC library, which somehow supposedly gets the type information . Or not. |
|
Back to top |
|
|
jpelcis
Joined: 09 Mar 2006 Posts: 40
|
Posted: Thu Jul 13, 2006 10:11 am Post subject: |
|
|
pragma wrote: | As kris said: that sucks. |
Sorry about that. My clarity failed me for a moment. See my other post for what I meant.
pragma wrote: | Usually when you're talking to a database, the schema is known. The typical exception to this is when you're working with a database suite like TOAD, in which case, it understands that it needs to dig deep in order to get metadata. IMO, one could write a rather nice extension (a layer on top rather than as a plugin, as Kris mentioned) to DDBI that handles vendor-specific metadata for other tasks... but that's clearly beyond the present scope of DBBI as far as I know. |
Most databases will happily tell you what type a piece of data is. That's the whole concept behind this proposal. The problem is what to do with databases that don't say what something is.
pragma wrote: | So why not err on the side of trusting the programmer, just as D itself does? Just allow the various row accessors (getInt, getString, etc.) to convert directly from binary data present in the recordset. I gather that there may be some promotion or other *small* conversion steps needed for some types (vendor-specific floating-point types, zero-terminated strings, etc), but mostly, it could be just static casts from binary data. |
The problem is that not only do they not return the type, but they return the data in string format (at least PostgreSQL does that). If it's simply cast to the desired type, the result would be pure garbage.
pragma wrote: | This can also help the string debacle: IMO UTF transcoding shouldn't be the responsibility of a database access layer. Just allow getString()/getUtf8(), getUtf16(), and get getUtf32() to cast the raw data over to the type at hand and return it to the user. To cover all the other cases, you could even provide a getRaw() that returns a void[] - just to play it safe. |
See above for why I think that form of getUTF#() wouldn't work. It the pure binary format was returned that would be perfect though.
As for the getRaw(), that's essentially what I was looking at with getObject, but I think that void[] is a better type for it. My proposal has been changed to represent that.
Last edited by jpelcis on Thu Jul 13, 2006 10:35 am; edited 1 time in total |
|
Back to top |
|
|
kris
Joined: 27 Mar 2004 Posts: 1494 Location: South Pacific
|
Posted: Thu Jul 13, 2006 10:30 am Post subject: |
|
|
If Postgres is the oddball, I'd simply leave it out
You mentioned the ODBC driver may be able to provide the missing metadata? That would work too |
|
Back to top |
|
|
larsivi Site Admin
Joined: 27 Mar 2004 Posts: 453 Location: Trondheim, Norway
|
Posted: Thu Jul 13, 2006 4:18 pm Post subject: |
|
|
kris wrote: | If Postgres is the oddball, I'd simply leave it out
|
Too bad it might be the best free SQL DB out there |
|
Back to top |
|
|
Kashia
Joined: 16 Feb 2006 Posts: 6
|
Posted: Sun Jul 16, 2006 2:28 pm Post subject: |
|
|
Bah! I will cry if you leave PostgreSQL out You just don't know how to handle the power
To get the table oid (needed for getting other info):
Code: |
SELECT c.oid FROM pg_catalog.pg_class c LEFT JOIN pg_catalog.pg_namespace n ON n.oid = c.relnamespace WHERE c.relkind='r' AND n.nspname NOT IN ('pg_catalog', 'pg_toast') AND pg_catalog.pg_table_is_visible(c.oid) AND c.relname='<<<<<table name>>>>>>'
|
Returns the colum names and types:
Code: |
SELECT a.attname, pg_catalog.format_type(a.atttypid, a.atttypmod) FROM pg_catalog.pg_attribute a WHERE a.attrelid = '<<<<table oid>>>>' AND a.attnum > 0 AND NOT a.attisdropped ORDER BY a.attnum
|
You may want to combine those two, or not. Flexibility is everything
Kashia |
|
Back to top |
|
|
jeremy_c
Joined: 09 Apr 2005 Posts: 16 Location: Ohio, USA
|
Posted: Wed Nov 08, 2006 4:08 pm Post subject: |
|
|
Hello! I'll post a "I'm Back" message somewhere else, but, well, "I'm Back!"
Anyway, going to use DDBI again I really like what has been done! Good work on everyone's part. I had an idea, maybe a silly one, but I'll just say it and you guys can hash it over.
What if we made the query functions accept an optional argument of what Row class to use, defaulting to a simple Row class constructor? i.e. DDBI can have a simple Row class giving the basic elements, trusting the programmer for whatever. But in other synarios where you want more power, you could inherit from an abstract Row class and do cool things like this:
Code: |
class PersonRow : BaseRow {
char[] getFullName() { return data["fname"] ~ ' ' ~ data["lname"]; }
AddressRow getAddress() { return queryForOne("SELECT ...", &AddressRow.Create); }
bool update() { db.execute("..."); }
}
PersonRow CreatePersonRow() { return new PersonRow() }
PersonRow[] rows = db.queryFetchAll("SELECT * FROM person", &CreatePersonRow);
|
DDBI could even provide some advanced functionality rows if you would like to use them, like UpdatableRow which can track changes and issue UPDATE statements itself when attributes are changed, etc...
Oh, then the people who need advanced type casting, data handling, etc... they could inherit from say BaseRow or any *Row class and go to town with all their own requirements. You could even override a function like:
Code: |
class BaseRow {
void set(char[] fieldName, char[] data) {
switch(fieldName) { case "id": id = atoi(data); break; ....}
}
}
|
Not sure how effective that would be but we trade performance for programmer convience all the time and do it successfully, look at Ruby on Rails!
Oh, one more note... I agree w/Kashia... Do not throw out PostgreSQL, please!!!!!!!!
Jeremy |
|
Back to top |
|
|
|
|
You cannot post new topics in this forum You cannot reply to topics in this forum You cannot edit your posts in this forum You cannot delete your posts in this forum You cannot vote in polls in this forum
|
Powered by phpBB © 2001, 2005 phpBB Group
|