INDEX
Explanations
names or words with 'll' in them
recurring instances of a specific syllable or phonetic pattern
New Auto-Interp
Negative Logits
[_
-0.62
"{-0.59
hered
-0.57
conformity
-0.57
Chan
-0.56
lished
-0.55
effected
-0.55
··
-0.54
в
-0.54
ãĤ§
-0.54
POSITIVE LOGITS
ibrary
1.19
oyd
1.18
uminati
1.18
ounge
1.12
sburgh
1.03
owship
1.02
ipop
1.01
ateral
0.99
ength
0.98
iard
0.98
Activations Density 0.038%