INDEX
Explanations
phrases indicating significant impacts or consequences
New Auto-Interp
Negative Logits
605
-0.15
ertas
-0.14
035
-0.13
ReadOnly
-0.13
experiencing
-0.13
лиÑĪ
-0.13
elor
-0.13
ActionController
-0.13
805
-0.13
Ñģл
-0.13
POSITIVE LOGITS
impact
0.33
effect
0.32
Impact
0.28
Impact
0.26
impact
0.26
knock
0.25
profound
0.24
Effect
0.23
bearing
0.22
adverse
0.22
Activations Density 0.051%