INDEX
Negative Logits
Stim
0.78
resembled
0.73
->$
0.71
resemble
0.70
resembles
0.70
닦
0.69
그러면
0.68
니아
0.67
ontst
0.66
확인함
0.65
POSITIVE LOGITS
factored
1.48
plays
1.43
сыгра
1.41
factor
1.36
play
1.30
factor
1.29
mattered
1.28
Factor
1.27
played
1.27
因素
1.25
Activations Density 0.433%