INDEX
Explanations
verbs related to boasting or bragging
instances of boasting or bragging behavior
New Auto-Interp
Negative Logits
Ͻ
-0.70
ãĥĺãĥ©
-0.67
ots
-0.64
ãĥ¯ãĥ³
-0.64
20439
-0.63
perature
-0.62
pora
-0.62
ãĤ«
-0.62
ella
-0.61
ãĥ¼ãĥ«
-0.61
POSITIVE LOGITS
doms
0.99
loudly
0.90
confidently
0.88
about
0.86
boasted
0.81
ously
0.80
ially
0.77
edly
0.77
ulates
0.76
antly
0.76
Activations Density 0.054%