INDEX
Explanations
instances of the word "plain" and its variations
New Auto-Interp
Negative Logits
ernaut
-0.18
gaard
-0.18
him
-0.17
klad
-0.16
naments
-0.16
aurus
-0.16
amines
-0.15
алов
-0.15
ampler
-0.15
ally
-0.14
POSITIVE LOGITS
jane
0.28
clo
0.24
chant
0.24
vanilla
0.21
-ÑĤаки
0.20
est
0.20
Jane
0.20
Vanilla
0.19
Dealer
0.18
Jane
0.18
Activations Density 0.012%