INDEX
Explanations
phrases related to reasons or causes
phrases that indicate conditional or causal relationships in the text
New Auto-Interp
Negative Logits
ocation
-0.62
warm
-0.60
uptake
-0.60
farm
-0.59
lick
-0.59
rises
-0.58
mailing
-0.58
nexus
-0.56
wn
-0.56
cano
-0.55
POSITIVE LOGITS
Spoiler
0.76
RANT
0.65
û
0.65
Bour
0.62
udeb
0.61
haps
0.58
guessed
0.57
IPS
0.57
Jong
0.56
ovi
0.56
Activations Density 0.161%