INDEX
Explanations
declarative statements preceding with "the truth is" or similar phrases
statements emphasizing truth or factual assertions
New Auto-Interp
Negative Logits
ebus
-0.92
yip
-0.76
robe
-0.76
mods
-0.73
reciation
-0.69
oslav
-0.67
ONSORED
-0.66
pload
-0.65
Interstitial
-0.65
fever
-0.64
POSITIVE LOGITS
ACTED
0.71
Roe
0.71
etheless
0.69
matter
0.68
matter
0.65
çİĭ
0.65
Franken
0.64
Lange
0.64
relevance
0.62
reality
0.62
Activations Density 0.149%