INDEX
Explanations
statements expressing a call to action or emphasis
instances of introductory phrases indicating personal perspective or statements
New Auto-Interp
Negative Logits
goto
-0.72
CVE
-0.67
bladder
-0.64
RTX
-0.59
Vita
-0.57
Reef
-0.57
nap
-0.56
mercury
-0.56
LV
-0.55
ãģŁ
-0.55
POSITIVE LOGITS
anmar
0.99
odore
0.86
jriwal
0.80
ulty
0.78
prosec
0.78
intendent
0.76
foundland
0.76
htaking
0.76
anyahu
0.76
initely
0.73
Activations Density 0.187%