INDEX
Explanations
statements indicating belief or disbelief in specific information
phrases expressing belief or skepticism
New Auto-Interp
Negative Logits
ific
-0.80
xes
-0.74
issance
-0.69
quit
-0.66
rection
-0.65
udo
-0.63
Thumbnail
-0.63
wast
-0.62
backer
-0.62
nesses
-0.62
POSITIVE LOGITS
eele
0.71
passionately
0.67
ASC
0.64
Hier
0.63
REL
0.62
Orig
0.61
================================================================
0.61
sincerity
0.61
longevity
0.58
apesh
0.58
Activations Density 0.175%