INDEX
Explanations
temporal references related to time and distance
New Auto-Interp
Negative Logits
akan
-0.16
Ont
-0.15
ORY
-0.15
Mocks
-0.15
atern
-0.15
sher
-0.14
828
-0.13
ainment
-0.13
ungan
-0.13
ont
-0.13
POSITIVE LOGITS
away
0.82
Away
0.68
away
0.65
Away
0.59
-away
0.53
apart
0.35
weg
0.32
aways
0.31
entfer
0.25
Apart
0.24
Activations Density 0.044%