INDEX
Explanations
expressions that indicate opinions or subjective assessments
New Auto-Interp
Negative Logits
ivent
-0.17
aldo
-0.17
qv
-0.15
ote
-0.14
oid
-0.14
Leigh
-0.14
alty
-0.14
Maher
-0.14
ent
-0.13
agnost
-0.13
POSITIVE LOGITS
uter
0.15
uvian
0.15
ायन
0.15
iyon
0.15
Incoming
0.14
ação
0.14
λικά
0.14
orz
0.14
959
0.14
085
0.13
Activations Density 0.005%