INDEX
Explanations
expressions of positivity and admiration
New Auto-Interp
Negative Logits
oko
-0.18
oric
-0.18
-0.17
orie
-0.16
edb
-0.15
elem
-0.15
ok
-0.14
ed
-0.14
greatness
-0.14
.lv
-0.14
POSITIVE LOGITS
-grand
0.21
lest
0.21
-looking
0.20
ideos
0.16
ulously
0.16
mente
0.15
Reputation
0.15
acon
0.15
oplast
0.15
ikip
0.15
Activations Density 0.053%