INDEX
Explanations
the word "flash" at various activations
references to "flash."
New Auto-Interp
Negative Logits
employed
-0.79
lain
-0.76
thy
-0.74
nil
-0.74
mens
-0.73
avia
-0.71
Guth
-0.71
Kron
-0.68
Cohn
-0.68
Colo
-0.68
POSITIVE LOGITS
flash
3.83
flash
2.87
Flash
2.45
Flash
2.39
flashes
2.28
flashed
2.03
flashing
1.94
flashlight
1.55
blink
1.46
flashback
1.33
Activations Density 0.015%