INDEX
Explanations
the word "fore" followed by a high number
New Auto-Interp
Negative Logits
ensable
-0.63
Archdemon
-0.60
Mine
-0.59
disl
-0.59
Nieto
-0.59
toilets
-0.58
Mechanics
-0.58
Chic
-0.57
bully
-0.57
random
-0.57
POSITIVE LOGITS
shadow
1.49
told
1.43
sight
1.41
closed
1.37
gone
1.33
warn
1.33
nsics
1.32
runner
1.30
seen
1.28
warning
1.27
Activations Density 0.015%