INDEX
Explanations
variations of the word "you" and its contextual references
New Auto-Interp
Negative Logits
µ
-0.16
oir
-0.15
on
-0.15
antly
-0.14
ader
-0.14
Besch
-0.14
gratuit
-0.13
frei
-0.13
itory
-0.13
autop
-0.13
POSITIVE LOGITS
dsn
0.17
ÄįenÃŃ
0.16
ulla
0.15
uali
0.15
ìļ¸
0.15
olan
0.15
acity
0.15
arter
0.14
lete
0.14
ELLOW
0.14
Activations Density 0.026%