INDEX
Explanations
phrases related to persuasion or strong personal beliefs
phrases emphasizing trust and belief in statements
New Auto-Interp
Negative Logits
entin
-0.87
sidel
-0.73
imer
-0.72
ynski
-0.72
Thumbnail
-0.72
ija
-0.70
uj
-0.68
idian
-0.66
atlantic
-0.64
eners
-0.64
POSITIVE LOGITS
sincerity
0.73
passionately
0.68
uncond
0.65
passion
0.65
hype
0.64
Dare
0.62
bly
0.61
courageous
0.59
acci
0.59
implicitly
0.59
Activations Density 0.060%