INDEX
Explanations
the word "true" with a high degree of activation
references to authenticity or being genuine
New Auto-Interp
Negative Logits
served
-0.76
uled
-0.75
RAW
-0.75
Pages
-0.74
acco
-0.74
ocene
-0.73
ONES
-0.73
Corp
-0.73
ambo
-0.71
chains
-0.69
POSITIVE LOGITS
believers
0.90
believer
0.88
ignment
0.73
positives
0.69
itability
0.68
polit
0.68
patriot
0.67
freshman
0.67
ance
0.66
ll
0.66
Activations Density 0.018%