INDEX
Explanations
specific mentions of the word 'fact'
the word "fact" and its variations, indicating a focus on statements of truth or assertion
New Auto-Interp
Negative Logits
neys
-0.65
unlocks
-0.62
yles
-0.61
ike
-0.60
ware
-0.59
skin
-0.59
Default
-0.58
otor
-0.58
fred
-0.58
meal
-0.58
POSITIVE LOGITS
fact
3.77
fact
2.09
Fact
1.93
Fact
1.85
facts
1.67
truth
1.55
reality
1.43
facts
1.32
Facts
1.28
irony
1.18
Activations Density 0.034%