INDEX
Explanations
texts with unusual characters, such as "Ċ", possibly indicating some unique formatting or encoding issue
sections of text that mention statistics or data related to societal issues
New Auto-Interp
Negative Logits
spitting
-0.92
yours
-0.86
ninja
-0.84
shelling
-0.84
grip
-0.84
swe
-0.83
swoop
-0.83
neighb
-0.83
craz
-0.82
rul
-0.80
POSITIVE LOGITS
References
1.68
Methods
1.65
Figure
1.62
Recent
1.62
Introduction
1.58
FIG
1.54
Conclusion
1.54
Background
1.53
Discussion
1.53
CONCLUS
1.53
Activations Density 0.435%