INDEX
Explanations
temporal indicators related to events or actions
after and before
New Auto-Interp
Negative Logits
reafon
-0.45
Ros
-0.39
듦
-0.39
reaſon
-0.39
magin
-0.38
living
-0.37
ROS
-0.36
mathbb
-0.35
houſe
-0.35
Bbb
-0.35
POSITIVE LOGITS
After
0.82
after
0.81
After
0.81
after
0.76
после
0.75
AFTER
0.72
después
0.71
setelah
0.71
katapos
0.69
AFTER
0.68
Activations Density 0.029%