INDEX
Explanations
occurrences of the word "I" and its variations to identify personal statements or opinions
New Auto-Interp
Negative Logits
spark
-0.23
/Sub
-0.22
STEM
-0.21
/spec
-0.21
slider
-0.21
Scanner
-0.21
spinner
-0.20
Salisbury
-0.20
Spark
-0.20
Salvador
-0.20
POSITIVE LOGITS
s
0.29
s
0.27
_S
0.27
Âłs
0.27
स
0.26
ÂłS
0.25
S
0.25
ãĤ¹
0.23
-S
0.23
س
0.23
Activations Density 0.110%