INDEX
Explanations
phrases expressing curiosity or investigation
instances of curiosity and references to change or transformation
New Auto-Interp
Negative Logits
lement
-0.69
Forge
-0.66
mentioned
-0.63
Issue
-0.62
mania
-0.61
marine
-0.60
elsen
-0.60
illon
-0.60
DEFENSE
-0.58
abase
-0.58
POSITIVE LOGITS
so
1.67
so
1.56
SO
1.30
So
1.20
So
1.15
such
1.07
SO
1.05
such
0.96
cumbers
0.74
yet
0.72
Activations Density 0.119%