INDEX
Explanations
advocated, advocating, voluntarily, persuade
New Auto-Interp
Negative Logits
đẳng
0.47
🏦
0.44
中部
0.44
玫瑰
0.43
㴓
0.43
आई
0.43
ائز
0.43
탱크
0.42
উত্তর
0.41
সূত্রে
0.41
POSITIVE LOGITS
advocated
0.47
advocating
0.43
თავ
0.42
refresh
0.39
pledge
0.39
ulc
0.38
merely
0.38
voluntarily
0.38
persuade
0.38
persuaded
0.37
Activations Density 0.008%