INDEX
Explanations
instances of the word "take" and its various forms
New Auto-Interp
Negative Logits
xes
-0.15
ãģĦãģĦ
-0.15
anity
-0.14
UIT
-0.14
گر
-0.14
uisse
-0.14
under
-0.14
rox
-0.13
orang
-0.13
raith
-0.13
POSITIVE LOGITS
aways
0.17
into
0.14
advantage
0.14
inch
0.14
иболее
0.14
responsibility
0.14
ALI
0.13
bart
0.13
Flight
0.13
/sub
0.13
Activations Density 0.162%