INDEX
Explanations
positive adjectives describing things or actions
positive and favorable descriptions of various subjects
New Auto-Interp
Negative Logits
insula
-0.62
ivities
-0.61
alcoholism
-0.58
odan
-0.58
oggle
-0.57
guyen
-0.57
ership
-0.56
etus
-0.56
ruction
-0.55
elist
-0.55
POSITIVE LOGITS
enough
0.89
compared
0.76
considering
0.74
anyway
0.74
ISH
0.73
aest
0.72
enough
0.72
ly
0.71
bones
0.69
ingly
0.68
Activations Density 0.124%