INDEX
Explanations
seem troubled or simple, keep asking
New Auto-Interp
Negative Logits
(“
1.14
(
1.05
:(
1.03
hilarious
1.01
awesome
0.99
amazing
0.99
lovingly
0.96
Awesome
0.95
(
0.95
hugely
0.95
POSITIVE LOGITS
..."
1.43
…"
1.42
…”
1.39
...”
1.31
."
1.27
...")
1.23
.""
1.21
.."
1.19
,"
1.17
...
1.16
Activations Density 0.148%