INDEX
Explanations
phrases emphasizing a positive or negative aspect of a situation
New Auto-Interp
Negative Logits
uum
-0.88
othal
-0.85
largeDownload
-0.75
apter
-0.72
trak
-0.67
odan
-0.64
bol
-0.64
URI
-0.63
REF
-0.62
akedown
-0.60
POSITIVE LOGITS
happen
1.09
happening
1.06
happened
1.05
EVER
0.94
ever
0.93
Happ
0.92
imaginable
0.88
happens
0.88
happ
0.85
Done
0.80
Activations Density 0.087%