INDEX
Explanations
instances of conditional phrases and logical structures in text
New Auto-Interp
Negative Logits
ses
-0.18
(
-0.15
phans
-0.15
eer
-0.15
rael
-0.14
ãĤ§
-0.14
bra
-0.14
ients
-0.14
ipment
-0.14
@nate
-0.14
POSITIVE LOGITS
adays
0.29
oret
0.28
gether
0.25
oretical
0.25
etheless
0.22
atre
0.21
bidden
0.21
ÑįÑĤомÑĥ
0.19
quarters
0.19
jourd
0.19
Activations Density 0.151%