INDEX
Explanations
environmental and energy-related words
New Auto-Interp
Negative Logits
Trib
-0.73
amen
-0.69
Raptors
-0.67
***
-0.67
142
-0.67
Tud
-0.66
Schwarz
-0.66
Amen
-0.65
TRI
-0.65
QU
-0.65
POSITIVE LOGITS
el
1.48
il
1.41
eling
1.34
els
1.31
illian
1.29
iling
1.26
ill
1.22
ils
1.20
iler
1.20
EL
1.17
Activations Density 0.086%