INDEX
Explanations
adjectives describing the difficulty or challenge of a situation
words related to difficulty or negative experiences
New Auto-Interp
Negative Logits
iddles
-0.92
hops
-0.81
events
-0.78
ankind
-0.76
ateurs
-0.75
Surve
-0.75
ults
-0.74
ATURES
-0.73
ifles
-0.71
agents
-0.70
POSITIVE LOGITS
foothold
1.04
amount
1.02
dose
0.97
relationship
0.95
impression
0.89
distinction
0.88
clue
0.88
path
0.87
tendency
0.87
solution
0.86
Activations Density 0.335%