INDEX
Explanations
describes simplicity or directness
New Auto-Interp
Negative Logits
şeyler
0.49
fortale
0.48
経営
0.45
важли
0.44
doenças
0.43
oorlog
0.43
další
0.42
nogle
0.42
assuntos
0.42
चुनौतियों
0.42
POSITIVE LOGITS
simply
0.72
simple
0.68
semplicemente
0.63
simple
0.60
simplemente
0.60
只需
0.59
straightforward
0.59
simply
0.59
exactly
0.58
সরাসরি
0.57
Activations Density 0.255%