INDEX
Explanations
adjective + noun combinations
New Auto-Interp
Negative Logits
diriger
-1.56
exceedingly
-1.54
både
-1.52
Anybody
-1.46
solamente
-1.45
faciliter
-1.43
siè
-1.35
frequentemente
-1.34
genoux
-1.32
BOTH
-1.31
POSITIVE LOGITS
in
1.88
with
1.84
at
1.78
that
1.74
and
1.59
during
1.59
difficult
1.58
relatively
1.49
this
1.48
really
1.46
Activations Density 0.056%