INDEX
Explanations
instances of the word "similar" as a descriptor or comparison
New Auto-Interp
Negative Logits
een
-0.17
yp
-0.17
vÃŃ
-0.16
eln
-0.16
essa
-0.15
printStats
-0.15
eer
-0.15
eter
-0.14
ngr
-0.14
hete
-0.14
POSITIVE LOGITS
-minded
0.23
ily
0.22
mente
0.21
-sex
0.20
teenth
0.18
weise
0.18
inded
0.17
minded
0.17
etto
0.17
-looking
0.17
Activations Density 0.029%