INDEX
Explanations
short phrases related to actions or tasks
mentions of dates in the context of historical events or timelines
New Auto-Interp
Negative Logits
ãĤ´ãĥ³
-0.73
Cause
-0.67
ullivan
-0.66
DN
-0.66
zar
-0.65
FIX
-0.65
abil
-0.64
Topic
-0.64
grain
-0.64
arah
-0.62
POSITIVE LOGITS
however
1.03
meanwhile
0.92
Mehran
0.72
though
0.69
Kahn
0.68
although
0.66
according
0.66
when
0.66
moreover
0.66
Shank
0.62
Activations Density 0.183%