INDEX
Explanations
things related to physical locations or directions
significant numerical references or measurements within the text
New Auto-Interp
Negative Logits
owship
-0.59
uncond
-0.55
withd
-0.54
orsi
-0.52
abwe
-0.52
mosqu
-0.52
inez
-0.51
ascus
-0.51
rongh
-0.51
Islamic
-0.51
POSITIVE LOGITS
airs
0.66
aside
0.66
hindsight
0.62
?)
0.61
trivia
0.59
disclaimer
0.59
caveat
0.57
nods
0.56
wise
0.54
irony
0.54
Activations Density 1.681%