INDEX
Explanations
instances of the word "thin" or variations of it
New Auto-Interp
Negative Logits
ãĥ¼ãĥ
-0.17
iences
-0.17
eb
-0.16
inous
-0.15
ufen
-0.15
ean
-0.15
hem
-0.15
ein
-0.15
pheres
-0.14
ains
-0.14
POSITIVE LOGITS
ning
0.43
NING
0.28
ners
0.26
ening
0.25
ness
0.24
kest
0.22
/th
0.21
ningen
0.20
ened
0.20
slice
0.20
Activations Density 0.023%