INDEX
Explanations
mentions of the word "Metal" with variations in capitalization and suffixes
references to "metal" in various contexts
New Auto-Interp
Negative Logits
Suc
-0.72
zee
-0.70
Rowling
-0.67
erate
-0.66
========
-0.65
Nurs
-0.63
Enlight
-0.62
Grande
-0.62
yip
-0.62
renheit
-0.60
POSITIVE LOGITS
anguage
1.67
detectors
1.35
detector
1.21
oxide
1.20
works
1.10
working
1.10
heads
1.06
bending
1.03
worker
0.99
workers
0.98
Activations Density 0.018%