INDEX
Explanations
references to actions and events involving the word "take."
New Auto-Interp
Negative Logits
Ïİν
-0.16
ем
-0.15
aby
-0.15
gage
-0.15
emics
-0.15
blade
-0.14
лаÑĪ
-0.14
afx
-0.14
camp
-0.14
ãĤ³ãĥ³
-0.14
POSITIVE LOGITS
olina
0.19
uel
0.17
OL
0.15
oo
0.15
azo
0.15
FF
0.15
owski
0.14
Ùĥت
0.14
441
0.14
ily
0.14
Activations Density 0.021%