INDEX
Explanations
negative sentiments or strong opinions
phrases indicating motivations or intentions, particularly regarding actions or behaviors
New Auto-Interp
Negative Logits
arbon
-0.62
anian
-0.59
greg
-0.58
anos
-0.58
Gujar
-0.58
menace
-0.55
igl
-0.54
famous
-0.54
ificial
-0.54
chip
-0.54
POSITIVE LOGITS
antry
0.56
toward
0.55
earnest
0.55
bribes
0.53
motives
0.53
altru
0.53
ooter
0.52
sincere
0.52
diligence
0.51
esty
0.50
Activations Density 1.130%