INDEX
Explanations
the word "True" in various contexts
instances of the word "true"
New Auto-Interp
Negative Logits
uled
-0.79
acco
-0.77
Pages
-0.77
ONES
-0.76
adish
-0.75
ocene
-0.75
Corp
-0.74
served
-0.73
chains
-0.70
oleon
-0.70
POSITIVE LOGITS
believers
0.93
believer
0.90
stic
0.76
ignment
0.73
ll
0.72
positives
0.71
çĭ
0.70
polit
0.70
freshman
0.69
sell
0.69
Activations Density 0.017%