INDEX
Explanations
phrases related to negative actions or events
negative phrases that imply criticism or conflict
New Auto-Interp
Negative Logits
ously
-0.75
HL
-0.64
edo
-0.61
â̦â̦â̦â̦â̦â̦â̦â̦
-0.60
/(
-0.58
oks
-0.58
xes
-0.57
entials
-0.57
:[
-0.56
.):
-0.56
POSITIVE LOGITS
_-
1.74
webkit
0.89
=-=-=-=-=-=-=-=-
0.83
ie
0.77
[|
0.76
/-
0.74
=-=-=-=-
0.74
named
0.71
cens
0.69
enough
0.69
Activations Density 0.066%