INDEX
Explanations
words and expressions that convey boldness or intensity
New Auto-Interp
Negative Logits
trinsic
-0.18
orns
-0.15
ανδ
-0.14
rypt
-0.14
ustin
-0.14
aina
-0.14
abel
-0.13
oga
-0.13
á»ĥ
-0.13
kinson
-0.13
POSITIVE LOGITS
ly
0.25
ness
0.25
personalities
0.19
s
0.19
symbol
0.18
çĦ¶
0.17
ifference
0.17
statement
0.17
ively
0.17
enough
0.17
Activations Density 0.043%