INDEX
Explanations
adjectives and descriptors for various situations and experiences
expressions of strong emotional reactions or opinions
New Auto-Interp
Negative Logits
ð
-0.68
·
-0.67
tains
-0.66
lio
-0.66
Featured
-0.66
continue
-0.63
pro
-0.62
donald
-0.62
bies
-0.61
("-0.61
POSITIVE LOGITS
certainly
0.83
ngth
0.80
definitely
0.71
gonna
0.67
gotta
0.67
Journals
0.66
defin
0.66
oret
0.65
yss
0.62
ĸļ
0.62
Activations Density 0.806%