INDEX
Explanations
instances of the word "thus" indicating conclusion or causation
New Auto-Interp
Negative Logits
kv
-0.15
readcr
-0.15
ton
-0.15
ignKey
-0.14
uchos
-0.14
olec
-0.14
å®®
-0.14
thon
-0.14
chai
-0.14
izm
-0.14
POSITIVE LOGITS
forth
0.38
ly
0.30
forward
0.22
LY
0.20
far
0.19
iasm
0.19
ìį¨
0.17
infeld
0.16
-called
0.16
far
0.15
Activations Density 0.021%