INDEX
Explanations
phrases where something is mentioned or identified with a specific name or term
instances of phrases indicating reference or attribution
New Auto-Interp
Negative Logits
incentive
-0.58
osion
-0.57
acity
-0.56
Clicker
-0.56
iscopal
-0.54
redo
-0.52
Liver
-0.52
allion
-0.52
Frie
-0.51
foreseeable
-0.50
POSITIVE LOGITS
hereafter
1.04
as
0.99
herein
0.98
loosely
0.97
derog
0.94
collectively
0.93
simply
0.89
isively
0.87
literally
0.84
internally
0.83
Activations Density 0.078%