INDEX
Explanations
the word "true" in various contexts
New Auto-Interp
Negative Logits
lahoma
-0.16
trap
-0.16
shed
-0.16
shire
-0.15
onse
-0.15
sburg
-0.15
ropolis
-0.15
trib
-0.14
tron
-0.14
land
-0.13
POSITIVE LOGITS
/false
0.26
caller
0.24
-blue
0.24
st
0.23
-life
0.19
izz
0.19
820
0.17
sted
0.17
-bel
0.17
_false
0.17
Activations Density 0.024%