INDEX
Explanations
phrases indicating a response or reaction to something
instances of the word "respond" and its variations
New Auto-Interp
Negative Logits
ILCS
-0.75
utical
-0.72
hered
-0.70
Tool
-0.70
rek
-0.67
sky
-0.66
rette
-0.66
caps
-0.65
lim
-0.65
tool
-0.65
POSITIVE LOGITS
criticisms
0.97
criticism
0.97
requests
0.91
queries
0.82
inquiries
0.80
stimuli
0.78
allegations
0.78
critics
0.76
questions
0.75
complaints
0.74
Activations Density 0.092%