INDEX
Explanations
time-related actions or events
expressions related to the progression or conclusion of events over time
New Auto-Interp
Negative Logits
ãĥ¢
-0.72
kidding
-0.71
#$#$
-0.68
actly
-0.62
unless
-0.59
ãĤ±
-0.58
altogether
-0.58
èĢħ
-0.56
################################
-0.56
discrimination
-0.56
POSITIVE LOGITS
nearer
1.07
closer
1.02
deeper
0.89
increasingly
0.86
wealthier
0.85
richer
0.84
tighter
0.82
farther
0.79
thinner
0.76
colder
0.76
Activations Density 0.111%