INDEX
Explanations
action words indicating changes, improvements, or necessary steps to be taken
phrases that indicate actions or conditions related to responsibility and requirements
New Auto-Interp
Negative Logits
natureconservancy
-0.73
huh
-0.63
Hamb
-0.60
ahead
-0.60
idae
-0.59
ohan
-0.58
bothered
-0.58
Tracker
-0.58
badass
-0.58
yt
-0.56
POSITIVE LOGITS
uay
0.70
"]=>
0.70
sole
0.63
ŃĶ
0.62
};
0.61
isse
0.60
kinson
0.60
eely
0.59
ril
0.59
olith
0.58
Activations Density 0.390%