INDEX
Explanations
references to significant historical events or societal changes
New Auto-Interp
Negative Logits
complies
-0.45
avoid
-0.45
avoid
-0.41
remain
-0.41
avoiding
-0.40
preven
-0.39
Avoiding
-0.39
remained
-0.38
await
-0.38
избе
-0.37
POSITIVE LOGITS
trouxe
0.69
mengubah
0.62
bring
0.59
brings
0.59
bringing
0.58
brought
0.57
Bring
0.57
Bring
0.56
trajo
0.55
exposing
0.54
Activations Density 0.119%