INDEX
Explanations
phrases that indicate boasting or pride in achievements or qualities
New Auto-Interp
Negative Logits
sst
-0.16
ugh
-0.15
odge
-0.14
ty
-0.14
614
-0.14
xCF
-0.14
nehmer
-0.14
trinsic
-0.13
-thumbnail
-0.13
apsed
-0.13
POSITIVE LOGITS
ably
0.23
ãĥķãĥ¬
0.15
abbage
0.15
Vak
0.14
еÑĢа
0.13
cif
0.13
lique
0.13
mans
0.13
ÅĽmy
0.13
jin
0.13
Activations Density 0.025%