INDEX
Explanations
instances of the word "from," indicating discussions of origin or background
New Auto-Interp
Negative Logits
fare
-0.15
riel
-0.14
2
-0.14
YP
-0.14
odo
-0.14
olo
-0.14
lesh
-0.14
1
-0.14
20
-0.14
3
-0.13
POSITIVE LOGITS
linky
0.16
pig
0.15
à¥Įत
0.15
emachine
0.14
cycle
0.14
orris
0.14
RULE
0.14
uez
0.14
xiety
0.14
AndWait
0.14
Activations Density 0.021%