![]() | ![]() | ![]() |
| |||||||
| Forums | Register | Groups | Awards | Arcade | Pets | T-Bucks / T-Store | Invite Your Friends | Blogs | Mark Forums Read |
| Web Design Forums and discussions on webdesign |
Web Design | |||||||||
|
|
|
|
| |||||
![]() |
| | LinkBack | Thread Tools |
| | #1 (permalink) |
| Civilians | Can someone please explain to me what type of character encoding (Unicode UTF8, ISO whatever, etc) plain vanilla text files are (I am talking text files as created for example with a linux vi or gedit or vim editor, the simplest text files)? I have been running into problems with HTML files I made, uploaded to a proprietary Content Management System (online campus software), that has an online html editor; when I then download my html, it seems to be funked somehow by the online software system so that when I try to look at my downloaded html with linux 'less' command or the vi editor, I get a warning that it is a binary file and all I see is gibberish (binary funky characters) rather than the text based html tags. I can still open and view the html in Mozilla Composer, and if I save it with e.g. Unicode UTF8 character encoding I can then see it with less command or the vi editor or some other plain text editor. I can also open the funked html in Openoffice, where I see the html source as tags, but just before the first <HTML> tag there are two funky binary characters (a y with two dots over it, followed by a vertical line with a backwards c attached to it); if I delete those two funky characters, then save the file with OpenOffice, I can then view the saved html with vi editor, etc. Very odd, I do not understand what is going on. If anybody can enlighten me I will be very greatful. |
|
| | #2 (permalink) |
| Civilians | Proteus wrote: > Can someone please explain to me what type of character encoding (Unicode > UTF8, ISO whatever, etc) plain vanilla text files are (I am talking text > files as created for example with a linux vi or gedit or vim editor, the > simplest text files)? It generally depends on the locale settings of the system. -- David Dorward <http://blog.dorward.me.uk/> <http://dorward.me.uk/> Home is where the ~/.bashrc is |
|
| | #3 (permalink) |
| Civilians | Proteus wrote: > Can someone please explain to me what type of character encoding plain > vanilla text files are That depends on what character encoding the files are in. Seriously, it's like asking "how long is a piece of string?" The answer is that it depends on how long the string is. > (I am talking text files as created for example with a linux vi or gedit > or vim editor, the simplest text files)? Many text editors will offer you the opportunity to choose a character set when you save the file. If you are using an English-language operating system, the editor will probably *default* to iso-8859-1, iso-8859-15, utf-8 or us-ascii, but may automatically select a different character set if you use characters that are unavailable in the default. -- Toby A Inkster BSc (Hons) ARCS Contact Me ~ http://tobyinkster.co.uk/contact |
|
| | #4 (permalink) |
| Civilians | On Mon, 26 Sep 2005 08 44 +0100, Toby Inkster wrote:... > That depends on what character encoding the files are in. >.. Ok, fair answer. Then is there some utility or way to easily determine what type of char encoding a text file is in? I mean, if I have somefile.txt or somewebpage.html, how can I know what char encoding is embedded in the file? Is there some utility (hopefully in linux) to look at the type of encoding used? |
|
| | #5 (permalink) |
| Civilians | On Mon, 26 Sep 2005 16:51:33 +0100, Brian Wakem wrote: ... > $ file ./* > ./1123499855.671.doc: Microsoft Office Document > ./domains: ASCII text > ./mbox: ASCII mail text.. Interesting. For html docs though the file command just shows it as HTML, no char encoding listed; not even if I rename the .html to .html.txt But that is a nice utility to know. |
|
| | #6 (permalink) |
| Civilians | Proteus wrote: > On Mon, 26 Sep 2005 16:51:33 +0100, Brian Wakem wrote: > .. >> $ file ./* >> ./1123499855.671.doc: Microsoft Office Document >> ./domains: ASCII text >> ./mbox: ASCII mail text.. > > Interesting. For html docs though the file command just shows it as HTML, > no char encoding listed; not even if I rename the .html to .html.txt > But that is a nice utility to know. use the -i flag. If if still just says text/html, then it's plain ASCII. -- Brian Wakem Email: http://homepage.ntlworld.com/b.wakem/myemail.png |
|
| | #7 (permalink) |
| Civilians | In article <pan.2005.09.26.14.44.34.625071@uselessemail.net >, Proteus <proteus@uselessemail.net> wrote: > Ok, fair answer. Then is there some utility or way to easily determine > what type of char encoding a text file is in? I mean, if I have > somefile.txt or somewebpage.html, how can I know what char encoding is > embedded in the file? Is there some utility (hopefully in linux) to look > at the type of encoding used? No. Bits are just bits if there is no metadata that tells you the encoding. In one text encoding a certain bit sequence might be a bullet point and in another it might be the symbol for the British Pound. The best a computer could do is the best a human can do: look at any particular encoding and say it's *probably* wrong, but that doesn't get you to what encoding is *definitely* right. You really need the author to add that metadata if you want it to be clear. |
|
![]() |
| Bookmarks |
| Thread Tools | |
| |
Similar Threads | ||||
| Thread | Thread Starter | Forum | Replies | Last Post |
| How to convert xls to plain ascii text. | dsaklad@zurich.csail.mit.edu | Microsoft Applications | 1 | 06-08-2005 00:00 |
| creating text files | edm | Microsoft Applications | 2 | 01-11-2005 15:00 |
| Importing Text Files | Tony | Microsoft Applications | 4 | 11-02-2004 20:00 |
| Exceed the 256 char limit for text in a cell | rlcohen | Microsoft Applications | 3 | 06-30-2004 18:08 |
| importing text files | vaughan | Microsoft Applications | 2 | 06-16-2004 02:19 |
![]() | ![]() | ![]() |