INDEX
Explanations
references to notable individuals and their contributions or characteristics
New Auto-Interp
Negative Logits
attending
-0.16
anner
-0.15
à¸ĩาà¸Ļ
-0.14
_NB
-0.14
267
-0.14
olit
-0.14
Guest
-0.14
ozÃŃ
-0.14
egment
-0.14
induction
-0.14
POSITIVE LOGITS
Direct
0.17
specialist
0.15
direct
0.15
دÛĮگر
0.15
Respons
0.15
cky
0.15
vice
0.15
chef
0.15
Direct
0.15
delegate
0.15
Activations Density 0.017%