INDEX
Explanations
instances of denial or negation in statements
New Auto-Interp
Negative Logits
osgi
-0.54
rikes
-0.52
TestBed
-0.51
Majefty
-0.50
pleaſure
-0.49
"}}
-0.49
ExecuteAsync
-0.48
leaſt
-0.48
octet
-0.48
Arki
-0.48
POSITIVE LOGITS
1.17
subreddit
1.06
1.01
0.98
0.94
subreddits
0.90
r
0.85
redd
0.81
reddits
0.80
subreddit
0.77
Activations Density 0.537%