INDEX
Explanations
specific nouns or adjectives
New Auto-Interp
Negative Logits
ihad
0.53
Hence
0.53
Zudem
0.49
كما
0.48
Admittedly
0.48
Unfortunately
0.48
Similarly
0.47
¹
0.47
Because
0.47
¹.
0.47
POSITIVE LOGITS
newest
0.72
possibility
0.66
distinction
0.63
quantity
0.63
fact
0.60
latest
0.58
phrase
0.58
authenticity
0.57
greatest
0.56
hyperlink
0.55
Activations Density 0.000%