INDEX
Explanations
phrases emphasizing a specific idea or point
references to important messages or themes
New Auto-Interp
Negative Logits
engeance
-0.73
ords
-0.71
urses
-0.68
erenn
-0.68
ENCY
-0.67
agonists
-0.67
enses
-0.65
itates
-0.65
umbers
-0.65
endars
-0.64
POSITIVE LOGITS
message
1.05
messages
1.00
Messages
0.91
message
0.91
conveyed
0.87
board
0.83
posts
0.83
FontSize
0.82
goodbye
0.80
Message
0.77
Activations Density 0.025%