INDEX
Explanations
references to significant events or relationships in personal narratives
New Auto-Interp
Negative Logits
/Dk
-0.15
°}
-0.13
baÅŁÄ±na
-0.13
ÑĢеÑī
-0.13
ouce
-0.13
eczy
-0.13
artık
-0.13
arer
-0.13
Annunci
-0.12
uo
-0.12
POSITIVE LOGITS
during
1.30
during
1.13
durante
1.07
During
1.07
During
1.01
bÄĽhem
0.74
während
0.73
tijd
0.69
Ø®ÙĦاÙĦ
0.58
sırasında
0.54
Activations Density 0.977%