INDEX
Explanations
instances of the word "true" in various contexts
New Auto-Interp
Negative Logits
ronics
-0.20
trib
-0.17
lot
-0.17
ses
-0.17
shire
-0.16
ummings
-0.15
recht
-0.15
ãĥªãĥ¼ãĤº
-0.15
truth
-0.15
ized
-0.14
POSITIVE LOGITS
/false
0.42
-blue
0.30
st
0.28
-life
0.23
caller
0.23
blood
0.21
-blood
0.21
fully
0.20
-bel
0.19
believers
0.19
Activations Density 0.061%