INDEX
Explanations
references to political issues and calls for action
New Auto-Interp
Negative Logits
eventuell
-0.55
supuestamente
-0.54
Possibly
-0.50
allegedly
-0.50
theoretically
-0.50
Theore
-0.49
evtl
-0.49
supposedly
-0.49
Apparently
-0.48
Possibly
-0.47
POSITIVE LOGITS
protections
0.59
bugün
0.53
hardworking
0.53
memastikan
0.53
rightly
0.52
noastre
0.52
dobbiamo
0.52
underscores
0.52
essenziale
0.51
chiare
0.51
Activations Density 0.475%