INDEX
Explanations
negations related to the concept of existence or reality
New Auto-Interp
Negative Logits
piú
-0.61
pondre
-0.57
wijze
-0.54
ElementRef
-0.53
zijne
-0.53
början
-0.52
alcuna
-0.52
nessuna
-0.50
staden
-0.50
použití
-0.50
POSITIVE LOGITS
<bos>
0.91
t
0.73
IfNot
0.65
んじゃない
0.53
rett
0.53
ratt
0.52
Aussie
0.52
tim
0.52
tsch
0.52
Arty
0.51
Activations Density 0.097%