INDEX
Explanations
occurrences of the word "For" indicating a focus on introductory phrases or conditions in various contexts
New Auto-Interp
Negative Logits
ef
-0.18
ty
-0.16
een
-0.15
tw
-0.15
eb
-0.15
387
-0.15
aven
-0.14
erable
-0.14
ties
-0.14
tor
-0.14
POSITIVE LOGITS
ums
0.16
lags
0.16
chio
0.15
-syntax
0.15
forum
0.15
Ñī
0.15
okus
0.14
ĵåIJį
0.14
acho
0.14
문ìĿĺ
0.14
Activations Density 0.063%