INDEX
Explanations
text fragments with unusual characters or symbols
references to specific brands or companies
New Auto-Interp
Negative Logits
Osc
-0.83
EG
-0.76
Sony
-0.70
stacked
-0.68
Benz
-0.66
jew
-0.65
Morg
-0.65
Spoiler
-0.65
Sony
-0.64
Loot
-0.64
POSITIVE LOGITS
Äģ
3.94
Ä«
3.08
Å«
2.46
Äĵ
2.31
á¹
1.86
Åį
1.65
Ç
1.64
á¸
1.51
Ê
1.41
Ä
1.39
Activations Density 0.015%