INDEX
Explanations
expressions of moral or ethical guidance
New Auto-Interp
Negative Logits
without
-0.42
hup
-0.39
WITHOUT
-0.39
alemanes
-0.38
udia
-0.37
/?
-0.37
czemu
-0.37
tưởng
-0.37
如果不
-0.36
GeneratedValue
-0.36
POSITIVE LOGITS
nor
3.40
nor
2.67
Nor
2.66
Nor
2.61
Tampoco
2.10
而是
2.00
NOR
1.93
sondern
1.89
neither
1.85
بلکه
1.82
Activations Density 0.476%