INDEX
Explanations
references to time-related phrases or durations
New Auto-Interp
Negative Logits
ACT
-0.15
kening
-0.14
eriod
-0.14
ë¦Ħ
-0.14
aras
-0.14
ieber
-0.14
&type
-0.14
Morm
-0.14
zim
-0.14
cheng
-0.14
POSITIVE LOGITS
199
0.19
198
0.18
childhood
0.17
197
0.16
IFI
0.16
aje
0.15
akk
0.15
196
0.14
Bol
0.14
201
0.14
Activations Density 0.043%