INDEX
Explanations
instances and variations of the word "for" in different contexts
New Auto-Interp
Negative Logits
ix
-0.16
confirmation
-0.14
background
-0.14
set
-0.14
hol
-0.14
ono
-0.14
trap
-0.14
obs
-0.14
Bieber
-0.14
PT
-0.13
POSITIVE LOGITS
erah
0.20
ocht
0.17
riere
0.15
ÑĤоÑĦ
0.15
cher
0.15
ẫn
0.15
-article
0.15
icher
0.15
wner
0.15
edith
0.15
Activations Density 0.010%