INDEX
Explanations
mentions of things being reliable
instances of the word "reliable."
New Auto-Interp
Negative Logits
ften
-0.93
owitz
-0.89
horn
-0.84
ifling
-0.80
thus
-0.80
hunt
-0.78
ogenesis
-0.78
thur
-0.75
pper
-0.75
ony
-0.74
POSITIVE LOGITS
reliable
1.13
reliability
1.12
unreliable
0.96
estim
0.89
iability
0.89
conclud
0.87
trustworthy
0.87
iable
0.86
narrator
0.84
intervals
0.83
Activations Density 0.013%