INDEX
Explanations
questions about information and explanations
questions related to understanding actions, motivations, and characteristics of subjects
New Auto-Interp
Negative Logits
idon
-0.67
iva
-0.66
IFIED
-0.65
ocaust
-0.65
<
-0.65
rome
-0.63
avor
-0.63
pez
-0.62
POR
-0.61
OV
-0.61
POSITIVE LOGITS
soever
1.21
accordingly
0.92
abouts
0.89
thereof
0.87
consequ
0.80
consequently
0.78
they
0.75
lengths
0.75
obstacles
0.73
pitfalls
0.72
Activations Density 0.132%