INDEX
Explanations
expressions of personal opinions or beliefs
New Auto-Interp
Negative Logits
Diſ
-0.64
pleaſure
-0.62
Reſ
-0.58
electrica
-0.57
Efq
-0.57
houſe
-0.56
Houſe
-0.54
Inſ
-0.53
Bede
-0.51
devis
-0.50
POSITIVE LOGITS
glaube
1.07
probably
0.99
chyba
0.92
Probably
0.88
ungkin
0.87
probablemente
0.86
creo
0.85
mutlich
0.85
provavelmente
0.84
probably
0.83
Activations Density 0.072%