INDEX
Explanations
explicit mentions of something without directly naming it
words related to explicit or formal declarations and statements
New Auto-Interp
Negative Logits
heart
-0.72
throats
-0.68
Posts
-0.67
Memories
-0.65
Garry
-0.65
favourites
-0.64
GER
-0.63
vocabulary
-0.63
jaws
-0.61
enthus
-0.61
POSITIVE LOGITS
addressed
0.99
detonated
0.87
contradicted
0.84
challenged
0.81
endorsed
0.79
denounced
0.78
defended
0.77
declare
0.76
testified
0.76
preceded
0.75
Activations Density 0.049%