INDEX
Explanations
repeated references to the word "thing."
New Auto-Interp
Negative Logits
thin
-0.17
lek
-0.15
ned
-0.15
licht
-0.15
sites
-0.15
ogne
-0.15
ised
-0.14
aptor
-0.14
ted
-0.14
ates
-0.14
POSITIVE LOGITS
ummy
0.24
/people
0.23
ToDo
0.23
ummies
0.20
happening
0.20
/person
0.18
562
0.18
854
0.16
Happ
0.16
happen
0.16
Activations Density 0.066%