INDEX
Explanations
phrases where the word "this" is mentioned prominently
instances of the word "this."
New Auto-Interp
Negative Logits
ARS
-0.78
utical
-0.73
letters
-0.73
okers
-0.72
aughtered
-0.72
arest
-0.71
hens
-0.71
acers
-0.71
pps
-0.70
asures
-0.70
POSITIVE LOGITS
week
0.99
trope
0.95
applies
0.91
happens
0.89
morning
0.88
BEFORE
0.86
month
0.86
behaviour
0.85
phenomenon
0.85
anecd
0.85
Activations Density 0.080%