INDEX
Explanations
the word "amazing" with relatively high activation values
expressions of admiration or enthusiasm
New Auto-Interp
Negative Logits
©¶æ
-0.83
epend
-0.82
pai
-0.82
embed
-0.77
vere
-0.77
eter
-0.76
aper
-0.76
avis
-0.74
few
-0.72
enser
-0.71
POSITIVE LOGITS
NESS
0.87
rendition
0.86
coincidence
0.85
feats
0.85
feat
0.83
amazing
0.75
talent
0.75
accomplishment
0.75
amounts
0.74
incredible
0.71
Activations Density 0.029%