INDEX
Explanations
"Okay," explanation starter phrase
New Auto-Interp
Negative Logits
6
1.04
theater
1.01
7
0.96
2
0.95
bagian
0.95
5
0.94
foss
0.94
astu
0.93
втори
0.93
8
0.93
POSITIVE LOGITS
extending
1.32
Extended
1.13
leveraging
1.10
extended
1.06
Nox
1.06
extend
1.05
reinforcing
1.05
granting
1.05
retrieve
1.03
malicious
1.02
Activations Density 0.011%