INDEX
Explanations
declarative statements involving knowledge or belief, often related to technology or information sharing
phrases indicating people's awareness or lack thereof regarding various topics
New Auto-Interp
Negative Logits
Advertisement
-0.73
=/
-0.66
Sprite
-0.65
bearer
-0.65
ciation
-0.64
Humane
-0.64
Savannah
-0.64
Resurrection
-0.63
Deadly
-0.63
contestant
-0.63
POSITIVE LOGITS
swear
0.80
mistakenly
0.78
misconceptions
0.78
unaware
0.76
wondering
0.74
perceive
0.73
poorer
0.73
underestimate
0.72
dissatisfied
0.72
experiencing
0.72
Activations Density 0.333%