INDEX
Explanations
negative words depicting failures or problems
negative descriptors related to failure and disappointment
New Auto-Interp
Negative Logits
ancies
-0.98
Sources
-0.79
icts
-0.77
rams
-0.77
anooga
-0.75
ensions
-0.72
profiles
-0.72
eways
-0.72
Roots
-0.71
fixes
-0.71
POSITIVE LOGITS
unto
1.09
compared
0.87
affair
0.83
worthy
0.79
akin
0.78
breaker
0.78
nonetheless
0.76
reel
0.75
worth
0.73
unworthy
0.73
Activations Density 0.267%