INDEX
Explanations
phrases including the word "community"
instances of the substring "com" within words
New Auto-Interp
Negative Logits
Krug
-0.63
Wilde
-0.63
Refuge
-0.63
steroids
-0.62
AFL
-0.59
recall
-0.59
Ducks
-0.59
514
-0.58
Wings
-0.57
Flan
-0.57
POSITIVE LOGITS
pleted
1.31
fortable
1.27
puters
1.26
pleting
1.13
pletion
1.11
puting
1.10
mitted
1.02
mented
0.99
edy
0.97
parable
0.96
Activations Density 0.024%