INDEX
Explanations
proper nouns or titles, particularly those in uppercase
occurrences of the word "THE" in various contexts
New Auto-Interp
Negative Logits
fitting
-0.72
cd
-0.71
java
-0.71
uid
-0.70
let
-0.69
hm
-0.68
iod
-0.67
fired
-0.67
ander
-0.66
perties
-0.66
POSITIVE LOGITS
ORY
1.15
LAST
1.15
FORM
1.10
IMAGES
1.09
STORY
1.05
ATER
1.05
IVERS
1.03
ISM
1.03
WEEK
1.02
DARK
1.02
Activations Density 0.019%