INDEX
Explanations
verbs or phrases suggesting influence, impact, or assessment towards the subject matter
modal verbs and auxiliary verbs indicating possibilities or abilities
New Auto-Interp
Negative Logits
ooters
-0.71
ques
-0.64
anwhile
-0.63
remarks
-0.62
allas
-0.62
Sources
-0.61
Reports
-0.58
rams
-0.56
Che
-0.56
Cond
-0.56
POSITIVE LOGITS
natureconservancy
0.79
antit
0.72
bothered
0.69
bitten
0.68
:(
0.68
fasc
0.65
bothering
0.65
genuinely
0.64
)].
0.63
STD
0.62
Activations Density 0.286%