INDEX
Explanations
expressions of various emotions or attitudes towards a specific topic or situation
expressions of concern or disapproval regarding various issues
New Auto-Interp
Negative Logits
Goodwin
-0.69
Blooming
-0.63
constructed
-0.62
Narr
-0.60
Glover
-0.59
Grain
-0.59
sear
-0.58
waterfall
-0.57
shoe
-0.56
agog
-0.56
POSITIVE LOGITS
rompt
0.81
bia
0.76
idad
0.76
llah
0.74
ociation
0.73
ilty
0.71
isexual
0.71
Letter
0.70
displeasure
0.70
oche
0.69
Activations Density 0.163%