INDEX
Explanations
negative descriptors related to conditions or experiences
New Auto-Interp
Negative Logits
Züge
-0.47
ciepła
-0.38
fromnode
-0.36
phosa
-0.36
tabung
-0.35
lète
-0.34
veremos
-0.34
Koordinaten
-0.34
iyor
-0.34
ărilor
-0.34
POSITIVE LOGITS
bad
1.08
Bad
1.02
Bad
0.97
BAD
0.93
bad
0.92
BAD
0.85
badly
0.65
mauvais
0.62
mauvaise
0.57
luck
0.57
Activations Density 0.084%