INDEX
Negative Logits
Character
0.54
캐릭터
0.52
Bek
0.52
Anna
0.50
Drain
0.49
Từ
0.49
Antes
0.48
Câu
0.48
Begriff
0.47
Nella
0.47
POSITIVE LOGITS
at
0.49
MA
0.48
Duh
0.45
SA
0.45
بوط
0.44
cov
0.43
loi
0.43
adjusts
0.43
hu
0.43
halfway
0.43
Activations Density 0.006%