INDEX
Explanations
phrases related to actions and intentions of people
references to individuals and their actions or claims
New Auto-Interp
Negative Logits
Flight
-0.69
artifacts
-0.63
taboola
-0.63
endif
-0.61
gettable
-0.61
NX
-0.61
ahoo
-0.60
else
-0.60
Fine
-0.59
ival
-0.58
POSITIVE LOGITS
championed
1.02
profess
0.96
purportedly
0.95
supposedly
0.91
esp
0.91
ostensibly
0.91
purported
0.88
preached
0.84
nurt
0.82
preach
0.81
Activations Density 0.206%