INDEX
Explanations
intensity and frequency of descriptive phrases that convey strong emotional or evaluative sentiments
New Auto-Interp
Negative Logits
onica
-0.16
reon
-0.16
olib
-0.16
Ñĵ
-0.15
olith
-0.14
cha
-0.14
ÅĻet
-0.14
aste
-0.14
idis
-0.14
canonical
-0.14
POSITIVE LOGITS
Įĵ
0.14
ars
0.14
rina
0.14
stakes
0.13
cliffe
0.13
anta
0.13
MAND
0.13
881
0.13
tun
0.13
206
0.13
Activations Density 0.062%