INDEX
Explanations
articles and common determiners in text
New Auto-Interp
Negative Logits
inmueble
-0.64
sonno
-0.63
cálido
-0.61
précie
-0.60
således
-0.59
ciasc
-0.59
ragioni
-0.58
dermed
-0.56
terecht
-0.56
niksi
-0.56
POSITIVE LOGITS
kasarigan
0.85
thing
0.79
دانشنامهٔ
0.77
stupid
0.76
guy
0.74
whole
0.72
other
0.71
crappy
0.71
WHOLE
0.69
ones
0.68
Activations Density 0.581%