INDEX
Explanations
references to significant historical figures and their quotes
New Auto-Interp
Negative Logits
and
-0.15
next
-0.15
which
-0.15
ương
-0.15
instead
-0.15
opposite
-0.14
ounced
-0.14
ogh
-0.14
Clo
-0.13
kÄĻ
-0.13
POSITIVE LOGITS
quoted
0.26
quoted
0.25
quote
0.21
Quotes
0.19
-quote
0.19
æijĺ
0.18
paraph
0.18
quotes
0.18
_quote
0.17
quoting
0.17
Activations Density 0.068%