INDEX
Explanations
instances of the word "just."
New Auto-Interp
Negative Logits
reas
-0.17
combe
-0.16
åħ
-0.16
otton
-0.15
nez
-0.14
ÑĢол
-0.14
agara
-0.14
uw
-0.14
alink
-0.14
afka
-0.14
POSITIVE LOGITS
608
0.17
ifi
0.17
recently
0.16
IFI
0.16
ine
0.15
uby
0.15
chai
0.15
uss
0.15
phin
0.15
_hdl
0.15
Activations Density 0.035%