INDEX
Explanations
phrases related to reactions or replies in a text
instances of the word "response" and variations indicating replies or reactions to events or statements
New Auto-Interp
Negative Logits
rome
-0.77
ffe
-0.70
cutting
-0.69
knots
-0.68
hemat
-0.67
ramer
-0.66
fo
-0.65
rust
-0.64
roots
-0.63
bound
-0.63
POSITIVE LOGITS
thereto
1.00
response
1.00
responses
0.91
reaction
0.88
response
0.82
reply
0.81
respond
0.79
Response
0.78
reply
0.77
DragonMagazine
0.77
Activations Density 0.030%