INDEX
Explanations
conjunctions and phrases related to logical connections and conditions
New Auto-Interp
Negative Logits
isc
-0.18
em
-0.15
ony
-0.15
oda
-0.15
app
-0.15
attempts
-0.14
bre
-0.14
avenport
-0.14
ano
-0.14
ilton
-0.14
POSITIVE LOGITS
.Objects
0.17
-threat
0.17
frec
0.16
hadn
0.16
perty
0.16
hud
0.16
uetype
0.15
adera
0.15
лоÑĤ
0.15
frequently
0.15
Activations Density 0.012%