INDEX
Explanations
statements about the current state or progress of various situations
phrases indicating the state of affairs or current conditions
New Auto-Interp
Negative Logits
[+]
-0.71
(>
-0.66
emark
-0.65
Tens
-0.65
pora
-0.64
odied
-0.63
affe
-0.63
ewitness
-0.63
raint
-0.63
¿½
-0.62
POSITIVE LOGITS
alright
0.91
icably
0.89
downhill
0.87
differently
0.85
fine
0.84
hun
0.82
ok
0.81
OK
0.78
sorted
0.77
okay
0.77
Activations Density 0.134%