INDEX
Explanations
expressions of personal opinions or reflections
references to personal thoughts and opinions
New Auto-Interp
Negative Logits
ALL
-0.83
Mamm
-0.69
Availability
-0.61
gm
-0.61
ARDS
-0.60
Claim
-0.60
toe
-0.59
ards
-0.57
Ann
-0.57
aud
-0.57
POSITIVE LOGITS
aloud
0.90
ileaks
0.86
cience
0.84
fulness
0.81
provoking
0.81
ynthesis
0.78
peed
0.78
cient
0.77
mares
0.77
thoughts
0.76
Activations Density 0.028%