INDEX
Explanations
terms related to exploitation and coercive relationships
New Auto-Interp
Negative Logits
Rüyada
-0.46
zdan
-0.41
Và
-0.39
coinciden
-0.38
вида
-0.38
Niemand
-0.37
soal
-0.37
OrNil
-0.37
mimo
-0.37
sprechend
-0.36
POSITIVE LOGITS
ſta
0.60
ſelf
0.59
pinulongan
0.58
0.58
Anſ
0.56
vrijwilli
0.55
ſch
0.55
hyö
0.54
raiſ
0.53
ſtra
0.52
Activations Density 0.953%