INDEX
Explanations
words related to various scientific and medical terms
New Auto-Interp
Negative Logits
m
-0.82
ness
-0.74
s
-0.71
baix
-0.68
ling
-0.67
silêncio
-0.66
us
-0.65
r
-0.64
ms
-0.64
masing
-0.63
POSITIVE LOGITS
auso
1.03
eo
0.98
Quo
0.97
o
0.95
Ando
0.95
оо
0.94
Malo
0.91
Dodo
0.91
Puro
0.91
eco
0.90
Activations Density 1.174%