INDEX
Explanations
phrases indicating judgment or warning
New Auto-Interp
Negative Logits
947
-0.17
aney
-0.17
879
-0.16
751
-0.15
876
-0.15
braco
-0.15
/REC
-0.15
á¿Ĩ
-0.15
863
-0.14
fleet
-0.14
POSITIVE LOGITS
Plates
0.23
Nep
0.23
Omni
0.23
Ether
0.22
Alma
0.21
Jared
0.21
Ether
0.20
plates
0.20
plates
0.19
Hel
0.19
Activations Density 0.002%