INDEX
Explanations
adjectives that describe qualities and characteristics
New Auto-Interp
Negative Logits
rike
-0.16
eya
-0.16
onto
-0.16
ÙĴس
-0.15
rikes
-0.15
illage
-0.15
illow
-0.15
lite
-0.15
bes
-0.15
eso
-0.14
POSITIVE LOGITS
enough
0.19
DITION
0.17
ness
0.16
ly
0.16
izza
0.15
ibar
0.14
ÙĤدر
0.14
ä¸Ķ
0.14
raig
0.14
tvrt
0.14
Activations Density 0.302%