INDEX
Explanations
instances of the word "or."
New Auto-Interp
Negative Logits
LINE
-0.59
legraph
-0.58
edia
-0.54
Cipher
-0.54
juggling
-0.54
IDENT
-0.54
rapists
-0.54
Cy
-0.54
Cascade
-0.53
robber
-0.53
POSITIVE LOGITS
nam
1.08
nery
1.04
chard
1.02
leans
0.98
lando
0.94
acular
0.90
acles
0.89
gin
0.87
acle
0.86
phan
0.86
Activations Density 0.013%