INDEX
Explanations
temporal indicators or phrases that denote time
New Auto-Interp
Negative Logits
pleaſure
-0.73
unſ
-0.63
Reſ
-0.61
myſelf
-0.57
raiſ
-0.57
deſt
-0.57
faſt
-0.56
صوتيه
-0.55
fubject
-0.55
neceſſ
-0.54
POSITIVE LOGITS
gdyż
0.78
because
0.76
ponieważ
0.73
poiché
0.70
because
0.66
Because
0.65
eftersom
0.65
因为
0.65
Porque
0.65
Because
0.64
Activations Density 0.279%