INDEX
Explanations
numeric information related to statistics, quantities, or counts
occurrences of numbers or statistics in the text
New Auto-Interp
Negative Logits
amaru
-0.66
¯
-0.62
Redditor
-0.61
FontSize
-0.61
ours
-0.59
feature
-0.59
slogan
-0.59
elf
-0.59
fame
-0.58
theirs
-0.58
POSITIVE LOGITS
%
1.05
percent
0.99
consecutive
0.88
th
0.88
instances
0.87
%,
0.82
separate
0.81
%-
0.81
00
0.81
81
0.80
Activations Density 0.195%