INDEX
Explanations
expressions of personal opinions and feelings
New Auto-Interp
Negative Logits
afone
-0.18
icina
-0.17
offered
-0.15
леж
-0.15
intervening
-0.15
andas
-0.15
ittest
-0.15
appealed
-0.14
awe
-0.14
amework
-0.14
POSITIVE LOGITS
found
0.24
enjoyed
0.21
found
0.21
Found
0.19
-found
0.19
Found
0.19
overall
0.19
(found
0.18
Overall
0.18
_found
0.18
Activations Density 0.100%