INDEX
Explanations
phrases related to intervention or taking action
phrases indicating intervention or involvement
New Auto-Interp
Negative Logits
omnia
-0.73
unct
-0.66
proceeds
-0.65
Mavericks
-0.63
wealth
-0.62
tumblr
-0.60
Consolid
-0.58
MAT
-0.57
summary
-0.57
Soc
-0.57
POSITIVE LOGITS
circle
0.79
elight
0.75
Desk
0.73
frog
0.72
lights
0.71
bounds
0.69
fray
0.69
stride
0.69
arsh
0.67
adr
0.67
Activations Density 0.075%