INDEX
Explanations
take risks, control, or advantage
New Auto-Interp
Negative Logits
triumphant
0.38
データを
0.37
bot
0.36
forth
0.36
dermat
0.36
マットレス
0.36
stadt
0.35
up
0.35
組み
0.35
passport
0.34
POSITIVE LOGITS
Taken
0.63
Taken
0.59
taken
0.57
taken
0.57
Take
0.54
TAKE
0.53
Take
0.52
TAK
0.50
genomen
0.47
seriously
0.47
Activations Density 0.014%