INDEX
Explanations
the word "thus" or variations of it, indicating conclusions or implications
New Auto-Interp
Negative Logits
readcr
-0.17
rosse
-0.16
EFA
-0.15
olec
-0.15
thon
-0.15
eer
-0.14
chai
-0.14
iece
-0.14
opt
-0.14
hai
-0.13
POSITIVE LOGITS
forth
0.30
ly
0.28
rd
0.25
far
0.21
soever
0.19
iasm
0.18
infeld
0.17
../../../
0.17
far
0.16
elve
0.16
Activations Density 0.018%