INDEX
Explanations
phrases indicating simultaneous events or actions
references to the concept of time
New Auto-Interp
Negative Logits
nce
-0.73
iland
-0.67
Founders
-0.67
ipedia
-0.66
ifer
-0.66
fingert
-0.65
psey
-0.64
ged
-0.63
ilater
-0.61
halla
-0.60
POSITIVE LOGITS
女
0.84
respecting
0.76
emphasizing
0.69
embracing
0.69
minimizing
0.68
anticipating
0.67
ignoring
0.67
keeping
0.66
releasing
0.65
acknowledging
0.65
Activations Density 0.029%