INDEX
Explanations
time-related references such as specific months, days, and years
temporal references or time-related terms
New Auto-Interp
Negative Logits
GY
-0.61
sew
-0.60
ISH
-0.60
orbit
-0.55
=>
-0.54
rotation
-0.54
RY
-0.54
TOR
-0.53
rotate
-0.53
Cooldown
-0.53
POSITIVE LOGITS
stating
0.85
reassuring
0.83
announcing
0.83
saying
0.81
outlining
0.80
describing
0.78
Saying
0.76
urging
0.75
praising
0.74
citing
0.73
Activations Density 0.211%