INDEX
Explanations
concepts related to subjective importance and significance
New Auto-Interp
Negative Logits
↵
-0.34
s
-0.31
тро
-0.30
“
-0.30
"
-0.30
"
-0.29
<eos>
-0.29
devriez
-0.29
sigo
-0.28
不
-0.28
POSITIVE LOGITS
témoig
0.76
queſta
0.72
beſch
0.72
Normdatei
0.70
idlertid
0.69
aimeJ
0.69
ſelben
0.68
ſcher
0.68
ロウィン
0.68
باردا
0.68
Activations Density 0.042%