INDEX
Explanations
large numbers referencing quantities like studies, people, and measurements
mentions of large numerical quantities, particularly in the context of studies or data
New Auto-Interp
Negative Logits
matt
-0.63
behav
-0.59
Goodwin
-0.54
jo
-0.53
bothered
-0.53
↵Âł
-0.52
pains
-0.52
nab
-0.51
offender
-0.51
HC
-0.51
POSITIVE LOGITS
000
1.67
500
1.38
700
1.33
800
1.33
600
1.29
400
1.25
300
1.24
200
1.22
900
1.19
dozen
1.12
Activations Density 0.058%