INDEX
Explanations
mentions of discussions, advancements, science, controversy, and emotional experiences in various contexts
New Auto-Interp
Negative Logits
reper
-0.67
Nare
-0.62
adv
-0.61
brim
-0.61
crawl
-0.61
uana
-0.58
traged
-0.52
ACTION
-0.52
ULTS
-0.52
territ
-0.52
POSITIVE LOGITS
ing
2.70
ed
2.15
edIn
1.59
edly
1.56
ment
1.52
ership
1.52
ING
1.42
ation
1.39
ions
1.39
ments
1.38
Activations Density 2.367%