INDEX
Explanations
references to individuals involved in various situations or incidents
New Auto-Interp
Negative Logits
ONTAL
-0.15
when
-0.15
themselves
-0.14
lfw
-0.14
,},↵
-0.14
_when
-0.14
ëĭĪê¹Į
-0.13
_contains
-0.13
å½ĵ
-0.13
aren
-0.12
POSITIVE LOGITS
from
0.34
living
0.31
working
0.30
belonging
0.30
with
0.27
residing
0.26
operating
0.25
specializing
0.25
representing
0.24
wearing
0.22
Activations Density 0.213%