INDEX
Explanations
phrases involving the word "take" or variations of it
New Auto-Interp
Negative Logits
thern
-0.17
uju
-0.15
udas
-0.15
jad
-0.15
ilater
-0.15
taj
-0.15
PU
-0.15
idth
-0.15
rine
-0.15
ime
-0.14
POSITIVE LOGITS
advantage
0.36
aways
0.24
seriously
0.22
uchi
0.20
advant
0.20
refuge
0.20
charge
0.20
Liberties
0.19
Advantage
0.19
liberties
0.18
Activations Density 0.113%