INDEX
Explanations
instances of official communication like letters, statements, and memos
documents or communications such as letters and statements
New Auto-Interp
Negative Logits
sake
-0.59
putable
-0.57
ayed
-0.57
Thieves
-0.56
distant
-0.56
rising
-0.56
newcomers
-0.56
lanes
-0.55
pires
-0.55
cakes
-0.54
POSITIVE LOGITS
outlining
0.87
thanking
0.86
titled
0.81
apologizing
0.77
stating
0.76
denouncing
0.74
recommending
0.73
saying
0.73
alleging
0.73
condem
0.73
Activations Density 0.254%