INDEX
Explanations
references to actions or events related to someone other than the speaker
references to others in the context of dependency or comparison
New Auto-Interp
Negative Logits
DOS
-0.73
ffee
-0.70
Encyclopedia
-0.69
ENTS
-0.68
Dictionary
-0.66
ESE
-0.65
ULE
-0.64
Lee
-0.64
Terror
-0.63
ARS
-0.63
POSITIVE LOGITS
worldly
1.65
than
0.82
intangible
0.74
swer
0.73
uctor
0.73
inki
0.70
entirely
0.70
dimensional
0.70
aterial
0.69
limb
0.68
Activations Density 0.058%