INDEX
Explanations
phrases suggesting introducing a new concept or idea
phrases related to borrowing and quoting
New Auto-Interp
Negative Logits
arnaev
-0.67
utenberg
-0.65
commit
-0.65
uador
-0.64
acca
-0.60
ebra
-0.60
uterte
-0.60
HRC
-0.60
20439
-0.59
owa
-0.58
POSITIVE LOGITS
...)
0.70
_.
0.67
*)
0.66
â̦)
0.66
inclined
0.65
paraph
0.64
Byz
0.63
charism
0.62
,)
0.60
arg
0.60
Activations Density 0.300%