INDEX
Explanations
references to physical bodily conditions or intense emotions
terms related to emotional distress or physical harm
New Auto-Interp
Negative Logits
izons
-0.81
isites
-0.75
ient
-0.70
redit
-0.69
vantage
-0.68
rists
-0.67
agonist
-0.66
atars
-0.65
zzy
-0.65
aeus
-0.65
POSITIVE LOGITS
guise
1.00
circles
0.91
manner
0.85
form
0.77
mode
0.76
folklore
0.74
fashion
0.74
estimation
0.72
territory
0.70
contexts
0.70
Activations Density 0.381%