Go Back   Trackpads Community > General Discussions > Computer and Technology > Web Design

Web Design Forums and discussions on webdesign

Web Design

Reply
 
LinkBack Thread Tools
Old 09-25-2005, 20:00   #1 (permalink)
Proteus
Civilians

 
Default what char encoding are plain text files?

Can someone please explain to me what type of character encoding (Unicode
UTF8, ISO whatever, etc) plain vanilla text files are (I am talking text
files as created for example with a linux vi or gedit or vim editor, the
simplest text files)?

I have been running into problems with HTML files I made, uploaded to a
proprietary Content Management System (online campus software), that has
an online html editor; when I then download my html, it seems to be funked
somehow by the online software system so that when I try to look at my
downloaded html with linux 'less' command or the vi editor, I get a
warning that it is a binary file and all I see is gibberish (binary funky
characters) rather than the text based html tags. I can still open and
view the html in Mozilla Composer, and if I save it with e.g. Unicode UTF8
character encoding I can then see it with less command or the vi editor or
some other plain text editor. I can also open the funked html in
Openoffice, where I see the html source as tags, but just before the first
<HTML> tag there are two funky binary characters (a y with two dots over
it, followed by a vertical line with a backwards c attached to it); if I
delete those two funky characters, then save the file with OpenOffice, I
can then view the saved html with vi editor, etc.

Very odd, I do not understand what is going on. If anybody can enlighten
me I will be very greatful.

 
Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
Trackpads Information
Click to Visit
Old 09-26-2005, 04:00   #2 (permalink)
David Dorward
Civilians

 
Default Re: what char encoding are plain text files?

Proteus wrote:

> Can someone please explain to me what type of character encoding (Unicode
> UTF8, ISO whatever, etc) plain vanilla text files are (I am talking text
> files as created for example with a linux vi or gedit or vim editor, the
> simplest text files)?


It generally depends on the locale settings of the system.

--
David Dorward <http://blog.dorward.me.uk/> <http://dorward.me.uk/>
Home is where the ~/.bashrc is
 
Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
Old 09-26-2005, 08:00   #3 (permalink)
Toby Inkster
Civilians

 
Default Re: what char encoding are plain text files?

Proteus wrote:

> Can someone please explain to me what type of character encoding plain
> vanilla text files are


That depends on what character encoding the files are in.

Seriously, it's like asking "how long is a piece of string?" The answer is
that it depends on how long the string is.

> (I am talking text files as created for example with a linux vi or gedit
> or vim editor, the simplest text files)?


Many text editors will offer you the opportunity to choose a character set
when you save the file. If you are using an English-language operating
system, the editor will probably *default* to iso-8859-1, iso-8859-15,
utf-8 or us-ascii, but may automatically select a different character set
if you use characters that are unavailable in the default.

--
Toby A Inkster BSc (Hons) ARCS
Contact Me ~ http://tobyinkster.co.uk/contact

 
Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
Old 09-26-2005, 12:00   #4 (permalink)
Proteus
Civilians

 
Default Re: what char encoding are plain text files?

On Mon, 26 Sep 2005 0844 +0100, Toby Inkster wrote:
...
> That depends on what character encoding the files are in.
>..


Ok, fair answer. Then is there some utility or way to easily determine
what type of char encoding a text file is in? I mean, if I have
somefile.txt or somewebpage.html, how can I know what char encoding is
embedded in the file? Is there some utility (hopefully in linux) to look
at the type of encoding used?

 
Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
Old 09-26-2005, 20:00   #5 (permalink)
Proteus
Civilians

 
Default Re: what char encoding are plain text files?

On Mon, 26 Sep 2005 16:51:33 +0100, Brian Wakem wrote:
...
> $ file ./*
> ./1123499855.671.doc: Microsoft Office Document
> ./domains: ASCII text
> ./mbox: ASCII mail text..


Interesting. For html docs though the file command just shows it as HTML,
no char encoding listed; not even if I rename the .html to .html.txt
But that is a nice utility to know.
 
Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
Old 09-26-2005, 20:00   #6 (permalink)
Brian Wakem
Civilians

 
Default Re: what char encoding are plain text files?

Proteus wrote:

> On Mon, 26 Sep 2005 16:51:33 +0100, Brian Wakem wrote:
> ..
>> $ file ./*
>> ./1123499855.671.doc: Microsoft Office Document
>> ./domains: ASCII text
>> ./mbox: ASCII mail text..

>
> Interesting. For html docs though the file command just shows it as HTML,
> no char encoding listed; not even if I rename the .html to .html.txt
> But that is a nice utility to know.



use the -i flag.

If if still just says text/html, then it's plain ASCII.


--
Brian Wakem
Email: http://homepage.ntlworld.com/b.wakem/myemail.png
 
Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
Old 09-26-2005, 20:00   #7 (permalink)
Doc O'Leary
Civilians

 
Default Re: what char encoding are plain text files?

In article <pan.2005.09.26.14.44.34.625071@uselessemail.net >,
Proteus <proteus@uselessemail.net> wrote:

> Ok, fair answer. Then is there some utility or way to easily determine
> what type of char encoding a text file is in? I mean, if I have
> somefile.txt or somewebpage.html, how can I know what char encoding is
> embedded in the file? Is there some utility (hopefully in linux) to look
> at the type of encoding used?


No. Bits are just bits if there is no metadata that tells you the
encoding. In one text encoding a certain bit sequence might be a bullet
point and in another it might be the symbol for the British Pound. The
best a computer could do is the best a human can do: look at any
particular encoding and say it's *probably* wrong, but that doesn't get
you to what encoding is *definitely* right. You really need the author
to add that metadata if you want it to be clear.
 
Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
Reply

Bookmarks

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are On

Similar Threads
Thread Thread Starter Forum Replies Last Post
How to convert xls to plain ascii text. dsaklad@zurich.csail.mit.edu Microsoft Applications 1 06-08-2005 00:00
creating text files edm Microsoft Applications 2 01-11-2005 15:00
Importing Text Files Tony Microsoft Applications 4 11-02-2004 20:00
Exceed the 256 char limit for text in a cell rlcohen Microsoft Applications 3 06-30-2004 18:08
importing text files vaughan Microsoft Applications 2 06-16-2004 02:19


Community Information
Options
Quick Options
Trackpads Non-Commercial Ad
Copyright Information Click to Visit
Time
Server Time
All times are GMT -4. The time now is 22:35.
Copyright
Copyright Information
The header is based off of work by Vipixel.com and modified by this site. Trackpads and the Trackpads Logo are both Registered Trademarks of Jason Edwards and cannot be used without prior written permission.  The only exception is as a link back to this site. Trackpads is a private website run by a small legion of volunteers, 3 dogs, 12.5 cats and an army of small, super smart, bio-engineered mice with pointy hats and tutu's. Search Engine Friendly URLs by vBSEO 3.2.0 RC7
Archive Links
Archive Links
Page generated in 0.96413 seconds with 19 queries