INDEX
Explanations
explaining to yourself or me
New Auto-Interp
Negative Logits
Individuals
0.85
تحمل
0.85
నగ
0.85
ehemalige
0.83
einiger
0.82
الشخص
0.82
あの
0.81
افر
0.81
COME
0.80
該
0.79
POSITIVE LOGITS
inquisitive
1.10
uninformed
0.99
curious
0.92
skeptical
0.92
interested
0.87
sceptical
0.87
inquiring
0.87
audiences
0.86
skept
0.83
perplexed
0.82
Activations Density 0.028%