INDEX
Explanations
statements of acceptance or agreement related to various subjects
New Auto-Interp
Negative Logits
vru
-0.44
protoimpl
-0.43
dormitórios
-0.40
bepaalde
-0.40
ModelExpression
-0.39
poils
-0.37
zoude
-0.37
fornece
-0.36
gedrag
-0.36
gräns
-0.35
POSITIVE LOGITS
okay
0.84
genre
0.71
okay
0.66
mean
0.64
OKAY
0.64
earth
0.63
Earl
0.62
genres
0.61
Okay
0.59
Earl
0.59
Activations Density 0.281%