INDEX
Explanations
references to the word "silver."
New Auto-Interp
Negative Logits
çķ
-0.19
eks
-0.16
say
-0.15
sj
-0.15
lescope
-0.15
sav
-0.15
ergarten
-0.14
esor
-0.14
sWith
-0.14
sv
-0.14
POSITIVE LOGITS
ware
0.32
ado
0.32
lining
0.29
lining
0.26
fish
0.23
stein
0.23
wares
0.23
-haired
0.23
-dollar
0.22
stone
0.22
Activations Density 0.012%