INDEX
Explanations
phrases related to actions causing a response or reaction
phrases indicating a cause-and-effect relationship or actions that trigger responses
New Auto-Interp
Negative Logits
mate
-0.75
atum
-0.71
sm
-0.66
rend
-0.65
lat
-0.65
ighth
-0.64
quart
-0.63
breaker
-0.61
route
-0.61
uded
-0.61
POSITIVE LOGITS
laughter
0.92
prompt
0.88
outcry
0.88
inquiries
0.86
applause
0.82
warnings
0.82
speculation
0.80
comparisons
0.79
widespread
0.77
investigations
0.76
Activations Density 0.077%