INDEX
Explanations
dissatisfaction or disapproval expressed through words like 'disappointed', 'disgusted', or 'disastrous'
words related to disappointment or discontent
New Auto-Interp
Negative Logits
Kinnikuman
-0.78
Reach
-0.75
glers
-0.70
Lans
-0.64
Alexandria
-0.64
Hedge
-0.64
eers
-0.63
hetti
-0.62
Actions
-0.61
Mortal
-0.61
POSITIVE LOGITS
cipline
1.10
placed
1.02
rup
1.02
comfort
1.02
cipl
1.01
patch
1.00
ruption
1.00
apers
0.99
licted
0.97
abled
0.97
Activations Density 0.009%