INDEX
Explanations
references to burning or being burned
New Auto-Interp
Negative Logits
Fauc
-0.19
ibel
-0.15
akt
-0.15
ials
-0.15
531
-0.15
oo
-0.15
215
-0.15
ve
-0.15
651
-0.15
aden
-0.14
POSITIVE LOGITS
alive
0.30
ished
0.28
доÑĤ
0.25
alive
0.22
ISHED
0.22
-toast
0.22
Alive
0.22
ishing
0.21
bridges
0.20
inated
0.20
Activations Density 0.032%