INDEX
Explanations
phrases related to discussions and statements of opinion
empty or blank sections in the text
New Auto-Interp
Negative Logits
destro
-0.84
redes
-0.72
referen
-0.71
conclud
-0.70
advoc
-0.70
avorite
-0.70
ADRA
-0.69
disadvant
-0.68
enegger
-0.66
encount
-0.64
POSITIVE LOGITS
please
0.55
âľ
0.55
println
0.55
?!
0.54
ËĪ
0.54
partName
0.54
Reck
0.53
!
0.52
ye
0.52
dont
0.51
Activations Density 0.515%