TI-IT::Tech-Ideas for IT

Saturday, February 15, 2014

Weekend productivity tools (PicPick)

Weekend arrives, this spare time period you've been waiting for during the hard-working week days...

Yes, a bit of time for yourself, you expect.... Imagine the scene: you are going to read about these interesting things you've been collecting in your reading wish-list, when one of your lovely sons or daughters comes in asking for help for their homework... Let's work again!!!

As you can suppose, today's homework is influenced by new technologies: cloud platforms for distributing and collecting information and homework, word processing tools, animated presentations... ¿daddy can you help me?

It's amazing the abilities nowadays children have developed naturally for using such tools. I must say, I've seen them prepare really attractive presentations (better than mine, uhmm.., that's not very difficult ;)). Thanks good, I still can contribute with my experience (life is life) in tools, tricks and tips!!!

In these duties we were when I came across what I think is a very helpful tool: PicPick for Windows. Important advice: I strongly recommend to decline to install other applications included in the installer.

As stated in its homepage this is an “all-in-one” design tool for everyone: Full-featured screen capture tool, Intuitive image editor, color picker, color palette, pixel-ruler, protractor, crosshair, whiteboard and more…

PicPick's feautes in tray icon menu

As far as I have tried, I can say that it’s a lightweight and very easy to use tool, which exposes all its features through a tray icon you can access quickly. You can configure it to start automatically with your session or call it when you need it.

Its first and main feature is screen capturing which can be done directly over the screen (by means of a keyboard shortcut), fencing the desired area and sending it directly to the clipboard (you must change the configuration as by default it will capture into a paint-like tool included in PicPick), avoiding the two or three steps I was used to do before (Printscreen+Paste on Paint+Cutting&Copying…).

Color Palette and Color picker are also very useful if you have to define colors in your presentations or in developing tasks in order to configure stylesheets.

Although I haven’t tried in a real situation, the Whiteboard tool seems to be a very helpful tool when you need to explain or teach something over a running application in a computer.

PicPick is a free application for home users. The price for commercial use rounds 25$.

Let me know if this post has interested you, and if you know about a similar Opensource alternative, or at least a full freeware one, please: let us know!

Sunday, February 09, 2014

Sobre herramientas de modelado con UML y otros asuntos relacionados

La necesidad de usar UML

A lo largo de los proyectos que he venido realizando en mi actividad profesional ha surgido en mayor o menor medida la necesidad de utilizar la notación UML (Unified Modeling Language) en los siguientes escenarios (principalmente):

Requisitos: fundamentalmente mediante diagramas de casos de uso para la captura de requisitos funcionales, diagramas de máquinas de estados y actividad para la comprensión de procesos y flujos de trabajo.
Diseño funcional: diagramas de paquetes para la descomposición funcional de la solución en paquetes o módulos y diagramas de casos de uso para la especificación de la funcionalidad que debería ofrecer la solución que estábamos construyendo.

Diagrama de casos de uso

Diseño de la solución y de su arquitectura: siendo el más importante el diagrama de clases, también otros como los diagramas de componentes, de secuencia, etc., según las necesidades concretas de uno u otro proyecto.

Diagrama de clases

Dicha necesidad se produce en proyectos con distintos enfoques sin estar reñida con el tipo de metodología aplicada en ellos: tanto en proyectos en los que se pedía un desarrollo "en cascada" o "en V", como en proyectos en los que se han aplicado metodologías ágiles en los que no se requiere un diseño completo previo al desarrollo pero en los que sin embargo el uso de UML sigue siendo válido y necesario.

Lo que he utilizado hasta la fecha...

Eran mediados de los noventa cuando the three amigos desarrollaron (integraron) sus mejores propuestas para dar lugar a la notación UML, y además la acompañaron de una metodología de desarrollo conocida como RUP (Rational Unified Process), que precisamente toma el nombre de la empresa en la que se hicieron fuertes Rational Software.

Rational Rose, fue la herramienta de referencia en aquel entonces, debido a que tenía el sello de los padres de la criatura y que ofrecía un alto grado de calidad y completitud para soportar los distintos elementos de la notación UML así como del proceso RUP.

Mi experiencia con esta herramienta es inexistente más allá de haberla abierto un par de veces o de haber leído alguna guía o tutorial sobre la misma. Esto se debe a que quizá cuando empezamos a utilizar estas técnicas no se justificaba la adquisición de una herramienta con un precio "considerable" teniendo en cuenta el uso que se le iba a dar: sólo algunos diagramas y de manera parcial, y a priori ninguno en lo relativo a RUP.

Por otra parte disponíamos de licencias de productos de ofimática y de desarrollo de Microsoft entre los que se encontraba Microsoft Visio, tanto como parte de la familia Office como con las versiones ...for enterprise architects (en su versión 2003) incluidas en las soluciones de desarrollo de Visual Studio .NET 2003 y Visual Studio 2005.

En mi opinión MS Visio es una herramienta magnífica (ya en su versión 2003 y a día de hoy también) por su versatilidad a la hora de realizar todo tipo de diagramas y por su extensibilidad (mediante un modelo de objetos COM, hoy poco apetecible).

Además parecía que era la respuesta de Microsoft para complementar sus herramientas de desarrollo Visual Studio a través de las versiones ...for enterprise architects con sus capacidades de reingeniería inversa y generación de código (no sólo en lo relativo a UML sino también para resolver el modelado de Bases de Datos mediante modelos ER físicos obtenibles por ingeniería inversa y con la posibilidad de generar implementaciones físcas mediante scripts de DDL).

Lamentablemente parecía que ésta no iba a ser la apuesta definitiva de Microsoft tal y como se vio cuando dejaron de incluirlo en las siguientes versiones de Visual Studio (¡y ya no se podía instalar salvo que mantuviésemos las versiones de VS anteriores!) o al comprobar que no se implementaban las últimas versiones de la notación UML (esta última "pega" la pudimos paliar en cierta medida gracias a las capacidades de configuración del propio Visio).

Aún así Visio es la herramienta que más he utilizado para cubrir las necesidades que comentaba al principio cuando se trataba de documentar o realizar reingeniería de código o bases de datos.

Con Visual Studio 2010 Microsoft introdujo una nueva serie de herramientas totalmente integradas en el entorno de desarrollo que ofrecen además la sincronización entre el código y componentes de la arquitectura y sus respectivos modelos. Lamentablemente estas capacidades están sólo disponibles en la versión Ultimate, algo que hace que sea totalmente inviable económicamente salvo para mantener un par de licencias (ni siquiera para cualquier empresa/organización).

Personalmente considero que las herramientas de modelado deben estar accesibles a todo el equipo de desarrollo que las ha de utilizar en mayor o menor medida según su papel en el mismo, y por la razón que comentaba veo muy difícil que la propuesta de MS sea la herramienta en la que decida aplicar mis esfuerzos...

¿Y ahora qué?

¿Sigo tirando de Visio o buscamos alguna alternativa? En la siguiente entrada trataré de dar una respuesta a esa pregunta comentando varias alternativas de Software Libre que he analizado estos días.

¡Espero vuestros comentarios!

Monday, April 29, 2013

Java: checked vs unchecked exceptions

Advertencia: primer post relacionado con Java!!!

Dando mis primeros pasos con Eclipse observo que el entorno nos ayuda al informarnos que determinados métodos pueden provocar excepciones y que estás deben ser capturadas o bien declaradas en la firma del método con la palabra clave throws (y si no le hacemos caso obtendremos los correspondientes errores de compilación). Genial.

Avanzo en mi ardua tarea y observo que no siempre se comporta de la misma manera, esto es, hay métodos que sí que pueden lanzar excepciones pero ante los cuales "no se me advierte", es decir, no estamos obligados a gestionarlas, esto es, pueden fluir a través de una pila de llamadas sin que tengan que ser ni gestionadas ni declaradas...

Investigo un poco más y me topo con el concepto de Unchecked exceptions asociado a la clase RuntimeException (y también a la clase Error, cosa "mu mala") para mí novedoso, ya que en mi periplo por .NET/C# toda excepción era unchecked por naturaleza.

Espero que el siguiente diagrama sirva de ayuda para explicarlo. Pincha aquí si realmente deseas enterarte ;-).

Aunque estoy tentado de hacerlo, no osaré en este post fijar criterio sobre a partir de qué tipo de excepción (checked vs unchecked) deberían heredar las excepciones que definamos en nuestro sistema (bueno un poco sí, aunque por ahí se dice que las Unchecked se corresponden a situaciones fruto de errores de programación... voy a "proponer" que así se clasifiquen también aquellas situaciones que no requieran una atención inmediata o próxima a su origen ;-)).

Mientras tanto, para abrir boca:

Googlear: Java: checked vs unchecked exceptions

Sunday, February 19, 2012

Procesos ágiles vs ingeniería civil

En este post voy a hacer referencia a una entrevista realizada por Marino Posadas en el #87 de dNM (antes DotNetManía) a Eric Evans, Udi Dahan y Diego Vega en el marco de la Conferencia DDD (Domain Driven Design) sobre Arquitecturas de Aplicaciones empresariales (evento inaugural de IASA-Spain Chapter, Madrid-7 Nov. 2011).

Aunque la noticia/entrevista ya tiene un tiempo, el tema de fondo es un "clásico" con el interés suficiente para ser comentado y revisado, o por lo menos a mí me lo parece ;).

Concretamente una pregunta dirigida a Eric Evans ("padre" de DDD, una filosofía de diseño de sistemas de software complejos, uno de los grandes autores y referentes de la Ingeniería del Software) que no tiene desperdicio (en el buen sentido de la expresión).

La pregunta (cito de la revista): ¿Cuál dirías que es hoy la mejor aproximación para el análisis?¿CMMI o las tecnologías ágiles?

Eric Evans responde que trata de no ser muy imperativo en cuanto al proceso pero que además ha de añadir que necesitas ser capaz de iterar, de recorrer el proceso varias veces. De ir atrás y revisar lo hecho, o añadir nuevas características.

Agrada de la respuesta la manera de centrar el tema, yendo al fondo de la cuestión al contestar sobre el Proceso, ese "todo" en el que se enmarcan las distintas "partes" referidas en la pregunta.

Vuelve a sorpender gratamente la actitud prudente (la humildad del sabio) que demuestra al decir que sobre esto no hay palabra de Dios (no ser imperativo)...

...aunque no deja lugar a dudas desde el primer momento sobre su apuesta por un proceso iterativo, rasgo característico fundamental de las metodologías ágiles.

EE continúa explicando que el modelado (haciendo referencia a la esencia de su paradigma en el cual se busca un modelo del dominio para nuestro problema de negocio) es un proceso de aprendizaje y añade si pretendes haberlo completado totalmente desde el principio, el resultado no va a ser muy bueno. Y si empleas mucho tiempo al principio tratando de refinar el modelo, aún así vas a tener problemas.

EE vuelve a hacer alusión a otro de los mantras de las metodologías ágíles, el del fracaso de tratar de aplicar a rajatabla los procesos de la ingeniería civil a la la ingeniería del software (tablas de pesos y medidas, modelo en V, cascada, etc.). Es interesante también notar que EE no es extremista ya que reconoce implícitamente un valor a las fases de diseño, pero dejando claro que una inversión fuerte en las mismas no será garantía de éxito.

Continúa pasando a dar su consejo: Es mejor empezar y aprender del proceso, y a medida que avanzas vas aprendiendo y vas cambiando una y otra vez el modelo, en un proceso de refinamiento progresivo.

Es decir, desarrollo iterativo e incremenental basado en un modelado / diseño "emergente".

Esa es la única forma en la que he visto auténticos modelos de calidad en funcionamiento. Y esa es una de las cosas que enfatiza más claramente el modelo ágil.

El maestro concluye la respuesta sentenciando: En resumen, aunque creo que no son la panacea, las tecnologías ágiles me parecen más válidas, desde el momento en que refuerzan estos principios.

Una vez más sin caer en la tentación de dogmatizar, nos deja claro cuál es su opinión al respecto: en este complejo negocio de los proyectos de desarrollo de software en el que los riesgos de desviaciones, insatisfacciones y pérdidas económicas acechan, tenemos en las metodologías ágiles nuestra mejor carta de navegación y compás para tratar de llegar a buen puerto (y si la tripulación es experimentada... pues mejor).

Podéis ver el vídeo de la entrevista aquí.

Friday, February 03, 2012

Una sola voz: Scrum y la natación textil

Muchas veces, a la hora de aplicar determinados sistemas, prácticas o métodos que de antemano reconocemos como válidos y beneficiosos, podemos vernos tentados por hacer ciertos ajustes que permitan despejar de la ecuación determinadas variables que nos molestan o incomodan...

Aquellos aspectos que se nos antojan prescindibles o incluso que consideramos mejor eliminar haciendo una aplicación del método "protestante" o libreinterpretativa que permita, desde luego, "mejorarlo". Me quedo con esto que mola, aquello no, que rollo, o no me conviene en este momento, y si lo hago así o asáu me encaja perfectamente... uy!, quizá alguien esté empezando a pensar en esa famosa disciplina textil de la natación, también conocida como nadar y guardar la ropa.

Como se dice coloquialmente "un poner": sea una metodología famosa, eso es, Scrum y sea uno de sus axiomas, principios básicos y reglas de fuego: Una sola voz.

Qué profundo suena..., seguro que a los que habéis ido a un cursillo de alguno de los gurús del tema os suena (cómo les gusta repetirlo a los muy pesados, se ve que lo hacen porque quedan como autenticos chulapos ante la audiencia).

Es decir, que estos del Scruntx se ponen pelmas con que debe existir un único ordeno y mando, que elabore la pila de producto y fije las prioridades (ojo y también que se responsabilice de que la gasolina funcional para que ande el equipo no este baja de octanaje, ¡no todo va a ser mandar hombre!). Los del cursillo, ya sabéis de quién estoy hablando: el Dueño de Producto, aunque a mi me mola más decir el productouner (otro día hablaremos de los guachimanes...)

Pues bien, en nuestra búsqueda de la perfección y mejor adaptación del método podemos decidir tener un par de productouners, así si uno está de vacaciones el otro le hace la sustitución, y quizá además alguno se lleve mejor con algunos miembros del equipo, o quizá uno sepa más que el otro del negocio o la tecnología y eso sea genial porque se puedan complementar.

Además el equipo puede preguntar a dos, como una segunda opinión médica, y ver cuál es la respuesta que más le conviene según lo que ya llevaba desarollado, etc. y ser más eficiente en su trabajo.

Vamos todos son ventajas, no sé cómo no se le ocurrió al que lo inventó... Pero espera... alguien ha dicho algo de axioma o regla básica... ¿a ver si nos cargamos algo por meterle mano a esto...? no sé, igual no lo tocamos y buscamos otros espacios de mejora... a lo mejor en Kanban que lo del productouner no está tan perfilado... bien!

Friday, November 25, 2011

Oracle RAC vs SQL-Server

I'm sure all you know about Oracle RAC (Real Application Cluster): Oracle's solution for high avalaibility (fault tolerance + load balancing), with multiple active nodes working on the same DB instance, that is, an infinite number of active nodes working together at the same time to achieve infinite processing and storaging capabilities (by means of adding as many nodes you need to the cluster and as many disks as you need to your SAN) ... that's more or less the Oracle's representatives' commercial message.

A little bit far from what Microsoft says in this paper WhyNotOracleRAC (What a title!).

Executive summary goes to the point: almost nobody uses RAC because it is expensive and complex to mantain, deploy and troubleshoot...

Instead MS states that "the Microsoft® SQL Server® database program represents a wiser investment because it can meet the same requirements as an equivalent Oracle RAC installation at a much lower cost" with an SMP (simetric multi-processing) approach, that is, powerful CPUs with lots of cores...

First comments:
-MS mostly resigns the way of paralelism based on separated machines... perhaps a correct decission nowadays but who knows tomorrow, it doesn't seem very wise to leave this door closed.

-In the Active/Pasive Cluster scenario (most used in SQL-Server) you would have one of you nodes, "stopped", without taking profit of it (and somebody would say, who cares!, moreover if in the end you get more for less...).

I have read here with surprise that Oracle doesn´t scale well in OLTP system (large amount of updates)... and that it does even worst in OLAP/Data Warehouse due to the payload required for state sync among cluster´s nodes...!!!

That is, if you trust MS, RAC, besides being expensive and complex, won't provide the benefits it's supposed it offers (the highest processing power). They also talk about an independt study who demonstrates that RAC doesn`t provide linear scaling, and that in order to get a real profit of it, you have to code your applications in a very particular manner in order to fit in the RAC architecture and avoiding to fall in worse performamce results than you would get in a more common deploying architecture.

Another thing to take into account is the failover recovery times when a node of the cluster fails:

Technology Recovery time (approximate)

Oracle RAC 30 – 60 seconds
SQL Server Database Mirroring <45 seconds
SQL Server Failover Clustering Minutes

Oracle: Surprise, surprise, there are also recovery times affecting the whole system in RAC: first for detecting the failing node, readjust parameters for the remaining ones, cleaning up all the hanged DB stuff that was being procesed on the failed node...

Also would be a mistake to suppose that RAC's load balancing capabilities are enough to isolate trasnparently your application from a failure of one of the nodes. The real stuff is that you still have to manage it in your application if you want to prevent users to notice this situations (not without little effort).

Anyway, recovery time for SQL Serer in Failover Clustering (minutes) are neither very attractive (and as said, it is the most common way to gain fault tolerance with SQL-Server beacuse of simplicity and security of data integrity).

That is, in general we can say that RAC beats MS in this point.

Finally:

It would be nice to read an Oracle's paper talking from te opposite side of the river, telling funny things about Redmon's guys. Perhaps this paper doesn`t exist, who cares in Oracle about SQL-Server.... ;)?

And, as you know, there are lots of aspects and features to take into account when deciding wich will be the better DB solution for an organization...

Sunday, February 27, 2011

Cloud computing: Something worthy for property professionals

In these article you can check an example of how many organisations are considering the benefits of moving to the Cloud. That is, it's not very difficult for anyone (not only for gurus) to get aware.

That's the case of the Royal Institution of Chartered Surveyors (Rics) asking to its followers to consider the adoption of cloud computing due to its ease of remote access and operation cost reduction among other reasons.

Saturday, February 26, 2011

Magical application development tools

Have you ever heard about such type of systems...

I knew about one in wich you should define first a model with its modelling tool. From the model code was generated automatically, supposed working and fullfilling your business needs.

This one says to be able to create a customizable Web 2.0 app., from an existing database model.

Has anybody real experience with it or with a similar one?

Tuesday, January 11, 2011

It's hot but not yet (Entity Framework provider for Oracle from Oracle)

Oracle announces the release of the last version (11.2 R3) of its ODAC connectivity component for the .NET platform.

Unfortunately it does not include yet their so expected .NET Entity Framework provider, BUT, they state that it is "coming soon" in a separate Beta. Yes!

Let's wait for a while hoping for a short "coming soon".

Stay tuned at the Oracle .NET Developer Center.

Friday, October 23, 2009

Mono 2.4 has been released!

Sometimes, when other boring things let me have a rest and my mind falls on it (thanks to my friend Roger this time), I like to have a look to the mono project.

I won't discuss the maturity or suitability of this platform in comparison to Microsoft Windows based native solutions (beware of Silverlight!!!), but there is something in the self-presentation you can read at the mono's site that leads me to meditation (or being more dotnetian "reflection", ;)).

That is:
Mono 2.4 has been released! The Mono Project aims to make developers productive and happy: Mono 2.4 is our gift to the world. Sponsored by Novell , the Mono open source project has an active and enthusiastic contributing community and is positioned to become the leading choice for development of Linux applications.

Microsoft technologies are losing every day more and more terrain in projects to be done in public administrations and governments, terrain that is gained by other Open supposed technologies in the name of freedom, gratuity, etc. (Is Java from Sun so Open? Which RDBMS are used in those open projects (often Oracle)?, we are often facing mere electoral reasons).

Moreover Java is the preferred language taught in universities (Microsoft starts losing the war from the first battles as the fresh working flesh arrives to the market without knowledge on dotnet).

So if the aim of Mono of becoming the leading choice for development of Linux application were reached, they would be achieving a big part of what Microsoft hasn't yet, that is, to spread dotnet and gain adepts across the world of IT solutions.

Saturday, March 07, 2009

IDisposable & "using" statement in C#

Are you using the using statement in your C# code?

What is it intended for?

Let's have a look to the official reference: Provides a convenient syntax that ensures the correct use of IDisposable objects.

And the sample code...

using (Font font1 = new Font("Arial", 10.0f))
{
byte charset = font1.GdiCharSet;
}

And the "must read" Remarks: File and Font are examples of managed types that access unmanaged resources (in this case file handles and device contexts). There are many other kinds of unmanaged resources and class library types that encapsulate them. All such types must implement the IDisposable interface.

As a rule, when you use an IDisposable object, you should declare and instantiate it in a using statement. The using statement calls the Dispose method on the object in the correct way, and (when you use it as shown earlier) it also causes the object itself to go out of scope as soon as Dispose is called. Within the using block, the object is read-only and cannot be modified or reassigned.

The using statement ensures that Dispose is called even if an exception occurs while you are calling methods on the object. You can achieve the same result by putting the object inside a try block and then calling Dispose in a finally block; in fact, this is how the using statement is translated by the compiler (as you can read in the official reference).

It also explains that you can declare and instantiate more than one object in the same using statement. And it also reminds us that, although possible, it is a bad practice to instantiate an object before the using statement in order to pass it to the using statement, as such an object would exist after the using's scope, but its unmanaged resources would be disposed, what in practice supposes invalidating the object, and creating a situation prone to errors.

Let's assume that all of us believe that using the "using" statement is a good practice (not everybody thinks the same, you can find here different opinions on this matter), a rule as said by Microsoft's reference, and therefore, it should be always applied in our code (including those cases in which we can see a wizzard's generated code with an empty "Dispose" method. Perhaps in future versions of this code it won't).

But the question is, how can a programmer be aware of disposable classes in order to code the "using" statement?

An alternative is knowing it by your own, based on your knowledge and mastership on the .Net Framework... a non very realistic approach, taking into account that you shoud make it extensible to any framework, class library or piece of code that gets into your hands...

You can also take advantage of Visual Studio's Intellisense or take a look to the class browser, in order to see if a class to be used by you implements the "Dispose" method. Extra work. At least, it can be a way of improving your mastership, ;).

So, there's not a systematic and reliable method to be adviced when you forget disposing your objects??? Yes, there is, or we'd better say, there was...

In VS2005's Code Analysis (aka FxCop) there is a rule explicitly intended to obtain the desired help:

CA 2000 - DisposeObjectsBeforeLosingScope

... a rule gone with the wind and no longer present in VS2008, along with a few more, as you can read in Neno Loje's blog, :(.

Read about the reasons in the Visual Studio Code Anlysis Team Blog, you'll see that this rule has disappeared with the removal of one of the analysis engines (you'll also find there an availability matrix of the rules in Excel format for the different versions of VS and FxCop):

Analysis engine removed. In Visual Studio 2008 and FxCop 1.36 we removed one of our analysis engines. This engine was removed for a variety of reasons; it increased analysis time (although the engine encompassed less than 5% our analysis, it took up 50% of our time-to-analyze), indeterministic results (results appearing and disappearing between runs), and bugs found within the engine (and hence the rules that depended on it) required huge architectural changes. We instead decided to invest the resources that we would have spent on fixing the old engine, on a new data flow analysis engine based on Phoenix, which we will ship in a future version of Visual Studio.

Not very pleasing news... we'll have to wait... perhaps third party's products? umh..., :(.

Saturday, February 21, 2009

NUMBER vs NUMBER(p, s) (Oracle 11g)

Or how to choose one of them for the numeric fields of your tables...

As I am sure you know, Oracle (11g RDBMS) offers the NUMBER datatype as the main choice to store numerical values in your tables (equivalent to decimal/numeric type of SQL-Server).

Although this is not the main purpose of this article, I assume that you know the difference between decimal precision datatypes (such as NUMBER) and binary precision datatypes, such as BINARY_FLOAT and BINARY_DOUBLE, also present in Oracle, and that you have decided that the real numbers of your application domain need a decimal precision storage -as you can read on Oracle's Reference, binary precision enables faster arithmetic calculations and "usually" (this is one of the key tip of this article) reduces storage requirements. But BINARY_FLOAT and BINARY_DOUBLE are approximate numeric datatypes and they store approximate representations of decimal values, rather than exact representations. For example none of them can exactly represent the value 0.1 ( don't believe it? Try it ,please) and perhaps this cannot be acceptable for the banking or tax application you are developing.

As a decimal type, NUMBER allows you to indicate the precision (total number of digits) and scale (number of digits to the right of the decimal point) when defining a field of this type (NUMBER(p, s)).

Let's view some examples:

NUMBER(9, 2): Nine significant digits in total (precision) of which 2 (scale) may be used for the decimal part of the value (digits to the right of the decimal point).

NUMBER(9): Nine significants digits in total, none of them for the decimal part. Yes, that's the way to restrict your fields for integer values storage.

NUMBER: "I will save whatever you give me" with an accuracy of up to 38 significant digits.

NUMBER(*,2): You set no limit to the precision but reducing (rounding) the decimal part to two digits.

NUMBER(9,-2): Nine digit for the integer part which will be "rounded" at the last two digits (interesting), i.e.: 987,654,321 -> 987,654,300.

Most of people would stay happy with the lazy definition of NUMBER (without p or s). But this is not our case, and when defining the accuracy of a NUMBER, we will be considering at least two objectives:

a) To restrict the entry of data: If we specify precision and scale, we are adding a restriction that allows us to establish a greater shielding on the data (the more "downstream" the better, and the shield will apply to any application developed over this database).

Problem: It is vital to know precisely in advance the needs of the field, which is not sometimes easy. For a field which is, for instance, intended to hold the surface of a construction in a cadastral application, precision and scale could be set without further problems (usually a two digit scale for area values in square meters).

But what precision and scale should be assigned to a coefficient K that can be fixed arbitrarily by a per year shifting taxation law? Perhaps what today is a ratio of two decimal digits, tomorrow will have six, causing to have to redefine the structure of the table every year with the usual associated impact in a productive environment.

b) The saving of disk space: It is common thinking that if you reduce precision the needs of storage cost will be reduced in the same meassure, and therefore you will save disk space. Is this true?

According to Oracle's Reference:

Oracle Database stores numeric data in variable-length format. Each value is stored in scientific notation, with 1 byte used to store the exponent and up to 20 bytes to store the mantissa. The resulting value is limited to 38 digits of precision. Oracle Database does not store leading and trailing zeros. For example, the number 412 is stored in a format similar to 4.12 x 10², with 1 byte used to store the exponent(2) and 2 bytes used to store the three significant digits of the mantissa(4,1,2). Negative numbers include the sign in their length.

Taking this into account, the column size in bytes for a particular numeric data value NUMBER(p), where p is the precision of a given value, can be calculated using the following formula:

ROUND((length(p)+s)/2))+1

where s equals zero if the number is positive, and s equals 1 if the number is negative.

Zero and positive and negative infinity (only generated on import from Oracle Database, Version 5) are stored using unique representations. Zero and negative infinity each require 1 byte; positive infinity requires 2 bytes.

That is, the size in bytes (see above in red) is variable and depends on of value stored in each case!!!

Let's try it:

SQL> create table tbl1 (
as_number number(12)
);

Table created.

SQL> insert into tbl1 values(20000000);
SQL> insert into tbl1 values (12345678 );

SQL> select as_number, vsize(as_number) from tbl1;
....

AS_NUMBER VSIZE(AS_NUMBER)
-------------- ----------------------
20000000 2
12345678 5

It is interesting to see that storing a value like 20000000 only requires 2 bytes, one for storing the mantissa (2) and another for the exponent of 10 (in this case 7). As said, the number of bytes used is dependent on the stored value.

Therefore, the first conclusion to be obtained is that the consumed amount of bytes is a function of the stored values. And as a result of the above you'd probably think that if you don't specify precision and/or scale, it will not have negative effects regarding to disk occupation (and neither on the performance, especially if the field has an associated index), uhm... really?

Be careful, this can be accepted as valid when stored values are integer, but it is not always valid when stored values are real numbers, because, for instance, the result of a division made in an UPDATE sentence between two fields holding real numbers, which must be stored in another NUMBER field, could lead to a full occupation (38 digits) of its available size in case that the result of that division produced further decimal digits (unlikely to be needed). Of course, this can be avoided using a Rounding function, but we don't want to rely on every current or future programmers who will evolve the system.

You can read about this in this article, in which it is explained in a detailed manner.

In this reading you will see that specifying the scale is highly recommended, because it allows to refine the decimal part, the main source of potential space "leaks".

Even when the type of the values to store is integer, it is a good practice to define such fields as NUMBER(*, 0), if we do not want put a limit on the precision or as NUMBER(p, 0) if we want to limit the number of input digits, as we ensure that so defined fields will never allow anything but integers (and that they will spend only the needed disk space to store the specific values that are inserted into them).

So, in order to avoid space leaks you should specify scale (although as said, if you can guarantee that your values will be integer, the problem won't exist)... provided, of course, that you have all the information needed to decide how to specify it...

General summary:

Whenever it's needed to shield what can be admited in a field in regard to maximum values and/or decimal digits, I recommend doing so in the definition of the field, indicating precision and scale NUMBER(p, s).

Regarding disk storage saving, it is important to set the scale specially for the results of real number calculations, because otherwise you are in risk of occupying all the capacity of NUMBER fields. However, if the data to be introduced is known to be always of integer type (a primary key based on a sequence (+1), for instance) we will not suffer such negative impact.

Note: This article is a refactored English version of a previous post on Tracasa's wiki by the same poster.

Sunday, February 01, 2009

TI-IT::Official Google Blog: "This site may harm your computer" on every search result?!?!

I didn't notice it but as you can see it did affect to every web site on the Internet...

Here you are the official explanation (human error):
Official Google Blog: "This site may harm your computer" on every search result?!?!

Saturday, January 31, 2009

Did they go mad at Google?

I was just searching references for the Microsoft Press "MCTS Self-Paced Training Kit (Exam 70-536)" book with Google, and surprisingly for me all the results had beneath them the following advise: "Este sitio puede dañar tu equipo" (original text as I was searching in Spanish), that is something like "This site might damage your computer" (using the keywords MCTS, self-paced, 536).

And I was so surprised because among others, I could find in such "damaging" site list Microsoft's msdn.microsoft.com or www.amazon.com as well.

Moreover, when I clicked on any of them, I was redirected to another Google's page where I was advised to not to go to the searched page (or if I did it, it would be under my responsibility ;)).

My first reaction (I should be more quiet, I know), has been to change my default search engine from Google to "Live Search"... Next thoughts: Responsibles of those firms wouldn't be very happy if they notice that their sites are not reachable through Google (as all we now, nobody uses it...).

Finally, two minutes after, I repeated the same search (same keywords) in Google, and then the unexplainable took place: The same results appeared now without the evil advise.

Can you understand that? Are their developments properly tested at Google before publishing them on the Internet? How much business of those companies may have been affected in this lapse of time?

3D Modelling

Cuando hablamos de catastro 3D, deberíamos tener en cuenta estos principios:

Existencia de la necesidad de registrar objetos catastrales (unidades catastrales, bienes) en tres dimensiones, porque las tradicionales dos dimensiones no son suficientes.
Un objeto catastral 3D debe considerarse pues como un volumen, frente a una superficie (representación habitual).
No siempre va a existir un case perfecto entre el concepto de planta y el volumen ocupado por un objeto catastral.
El volumen ocupado por una unidad no tiene porque ser un cuerpo simple, aunque normalmente será una agrupación de cuerpos simples regulares.
Dentro de un volumen pueden existir plataformas que deben considerarse ya que suponen un aprovechamiento de superficie (¿también se podrían considerar como volúmenes virtuales cuando algun lado queda abierto?, no me gusta).
De aplicación general: Edificaciones, túneles, cuevas, etc.

Tuesday, February 01, 2005

¿Alta de "hijos" = modificación de "padres"?

Dentro de la filosofía general del histórico está el no recoger en el mismo las altas de las entidades para las que se mantiene.

¿Pero qué ocurre cuando a una entidad (pe: parcela), se le incorpora un nuevo hijo (pe: subárea, o a ésta una unidad)?

¿No pueden considerarse esas altas de hijos, como modificaciones del padre?

Así al consultar la historia de un padre nos debería decir que ha tenido hijos.

Esto encaja con la idea de tener un histórico de estados + un log de cambios (que es ahí donde irían estos registros).

El histórico actual es un log de cambios (algunos con mayor detalle como son las transmisiones) al que le faltan las altas. A su vez se complementa con las tablas _b.

Para salir del paso: Agregar al histórico las altas de hijos (a día de hoy: subáreas, unidades y subparcelas).

Tuesday, January 04, 2005

Histórico vs log de modificaciones

Es posible plantear dos tipos de preguntas a un histórico:
Tipo 1: ¿Cómo era tal elemento a una fecha dada? o ¿Cómo es hoy un elemento cuyo atributo a una fecha dada era tal?
Para responder a este tipo de pregunta es necesario mantener todos los estados por los que atraviesa un elemento, almacenando fechas, usuario y referencia a los eventos de cambio que los mueven.

Tipo 2: ¿Qué cambios ha sufrido un determinado elemento? Concretando un poco más se podría preguntar ¿Qué tipo de cambios físicos ha sufrido una subparcela? ¿Qué cambiós jurídicos (transmisiones) ha sufrido una unidad urbana? (es a lo que estamos acostumbrados en catastro).
Para responder a estas preguntas parece necesario llevar un registro de las operaciones que se realizan sobre los distintos elementos del modelo de datos que se quieran trazar.
En este apartado son de especial interes las transmisiones (de quien a quien, grupos...), los cambios de referencia y el linaje (segregaciones, agragaciones, etc.).

Además hay que responder a distintos niveles, pudiendo agrupar unos a otros (qué ha pasado en una parcela implica responder qué ha pasado en las unidades que contiene).

En el caso de las transmisiones se podría intentar su reconstrucción a partir de una tabla de log (cruzando la tabla consigo misma), pero habría que tener cuidado con las fechas (expiry=effective, siendo distintos para cada transmisión encadenada). Otra posibilidad sería la de presentar la historia de la titularidad como la sucesión de estados, ya que cada cambio de estado implica una transmisión (mismo problema con los rangos de fechas, ya que son las que determinan los estados).

Para que la información no esté sujeta a posibles incosistencias, sería necesario por tanto almacenar explícitamente las transmisiones (identificando cada transmisión) de manera similar a como se podría resolver una relación de linaje.

Histórico, expedientes y documentos

En la gestión de histórico es necesario mantener la referencia al evento de cambio que ha producido una determinada modificación. En la mayoría de los sistemas que manejamos suele tratarse de un documento (escritura, etc.).

Otra posibilidad es la de asociar los cambios a un expediente. Tiene la ventaja de ser más versátil, portable y fácil de manejar. Es ideal cuando el expediente tiene un sólo documento que produce todos los cambios que se han de realizar. Ya que entonces basta con hacer referencia al expediente.

De lo contrario, si en un mismo expediente existen distintos eventos de cambio que deben quedar registrados en el sistema, además del expediente será necesario hacer referencia a dichos eventos de cambio (documentos, etc.), desde las entidades a las que afectan.

Histórico y corrección de errores

Dentro del problema del histórico, surge el tema de la corrección de errores.

¿Qué pasa cuando se detecta que un cambio realizado estaba mal hecho?

En la base de datos a veces esto se da y se suele "deshacer" el cambio de manera que no quede rastro de él (como si nunca se hubiese producido), lo cual no es del todo correcto ya que aunque si que la base de datos refleja la historia de la realidad (se elimina aquello que en la realidad no fue cierto), se pierde la información que soporta el hecho de que en la base de datos la información sí que tuvo el error (que puede haber dejado rastros imborrables como cédulas emitidas, salvados anuales, estadísticas, etc.).

Esto está directamente relacionado además con los tiempos transaccionales (cuando las cosas se registran en el sistema-fecha grabación-) y los tiempos reales (cuando las cosas pasan en el mundo real-fecha escritura-).

La idea es poder dejar registro de las dos cosas:
Opción 1:
Crear un histórico a medida que esté siempre libre de errores (los arreglos se harían tanto en la vista actual como en la histórica).
Tener además un log automático de todo lo que va sucediendo en la BD.
Opción 2:
Mantener un único log histórico, en el que los registros que respondan a modificaciones erróneas estén marcados de alguna manera (pudiendo tenerlos o no en cuenta en la explotación del histórico).

Precisión: Realidad vs Realidad Registrada
Un sistema no tiene porqué recoger todos los estados de la realidad, y puede que le basten con determinados estados intermedios (en un documento de aceptación de herencia y compraventa, podría ser suficiente con recoger la transmisión general, sin tener en cuenta la transmisión intermedia).