INDEX
Explanations
references to knowledge and understanding of details, particularly in specific contexts
New Auto-Interp
Negative Logits
urette
-0.18
gili
-0.15
#__
-0.15
.BL
-0.14
éo
-0.14
arton
-0.14
ÏĢοÏħ
-0.14
WithDuration
-0.13
_PAD
-0.13
nown
-0.13
POSITIVE LOGITS
intimately
0.47
better
0.41
well
0.40
intimate
0.35
firsthand
0.32
better
0.31
well
0.28
mieux
0.27
intim
0.27
backwards
0.27
Activations Density 0.162%