INDEX
Explanations
the specific word "time" in sentences
references to temporal expressions or the concept of time
New Auto-Interp
Negative Logits
avorite
-0.84
ancial
-0.77
ortium
-0.75
warts
-0.65
platoon
-0.64
ging
-0.63
ourt
-0.62
ebted
-0.61
rosis
-0.61
raltar
-0.61
POSITIVE LOGITS
frames
0.81
ZI
0.75
spe
0.68
IAS
0.65
liest
0.63
EMS
0.63
frame
0.62
code
0.61
zone
0.59
é¾į
0.59
Activations Density 0.031%