INDEX
Explanations
references to memorabilia or the act of memorization
New Auto-Interp
Negative Logits
IRO
-0.68
OHN
-0.66
Prometheus
-0.61
roxy
-0.60
methane
-0.60
Conclusion
-0.60
Canary
-0.60
Wenger
-0.60
Simulator
-0.59
ACS
-0.59
POSITIVE LOGITS
abilia
1.64
ably
1.06
ographed
1.05
ific
1.02
memor
1.01
istically
1.00
icol
0.95
ized
0.95
ruction
0.93
brance
0.93
Activations Density 0.008%