INDEX
Explanations
personal pronouns and verbs indicating discourse or thought
first-person statements and expressions of personal opinion
New Auto-Interp
Negative Logits
Mile
-0.61
Monthly
-0.60
ILCS
-0.58
Uniform
-0.58
Afterwards
-0.57
Scroll
-0.57
Offline
-0.57
undisclosed
-0.56
Delicious
-0.56
unknown
-0.56
POSITIVE LOGITS
Rather
1.02
merely
1.02
gemony
1.01
simply
0.92
Instead
0.86
issions
0.85
mere
0.82
sole
0.81
Rather
0.81
're
0.79
Activations Density 0.189%