INDEX
Explanations
references to specific content or citations within a text
New Auto-Interp
Negative Logits
wildlife
-0.79
retard
-0.74
flow
-0.72
lull
-0.72
daily
-0.72
clin
-0.72
intensive
-0.71
park
-0.71
star
-0.69
prey
-0.69
POSITIVE LOGITS
including
1.69
although
1.66
see
1.64
which
1.64
excluding
1.61
except
1.61
albeit
1.61
usually
1.56
especially
1.54
sic
1.52
Activations Density 0.321%