INDEX
Explanations
discussions related to belief systems and the distortion of truths in narratives
New Auto-Interp
Negative Logits
Jefus
-0.87
Efq
-0.84
myſelf
-0.81
itſelf
-0.77
houſe
-0.73
preſent
-0.68
pleaſure
-0.68
ſelf
-0.67
neſs
-0.67
Chrift
-0.67
POSITIVE LOGITS
croire
0.90
claims
0.78
claim
0.77
belief
0.75
believe
0.69
rằng
0.69
RegressionTest
0.67
Belief
0.67
believes
0.66
Claims
0.65
Activations Density 0.677%